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This book is an introductory textbook in probability and statistical inference. 
No prior knowledge of either probability or statistics is required, although 
prior exposure to an elementary precalculus course would prove beneficial in 
the sense that the student would not see the basic concepts discussed here for 
the first time. 

The mathematical prerequisite is a year of calculus and familiarity with 
the basic concepts and some results of linear algebra. Elementary differential 
and integral calculus will suffice for the majority of the book. In some parts, 
such as Chapters 4, 5, and 6, the concept of a multiple integral is used. Also, 
in Chapter 6, the student is expected to be at least vaguely familiar with the 
basic techniques of changing variables in a single or a multiple integral. 


i Chapter Descriptions 


The material discussed in this book is enough for a one-year course in introduc- 
tory probability and statistical inference. It consists of a total of 15 chapters. 
Chapters 1 through 7 are devoted to probability, distributional theory, and 
related topics. Chapters 9 through 14 discuss the standard topics of para- 
metric statistical inference, namely point estimation, interval estimation, and 
testing hypotheses. This is done first in a general setting and then in the special 
models of linear regression and analysis of variance. Chapter 15 is devoted to 
discussing selected topics from nonparametric inference. 


This book has a number of features that differentiate it from existing books. 
First, the material is arranged in such a manner that Chapters 1 through 8 can 
be used independently for an introductory course in probability. The desirable 
duration for such a course would be a semester, although a quarter would 
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also be long enough if some of the proofs were omitted. Chapters 1 though 7 
would suffice for this purpose. The centrally placed Chapter 8 plays a twofold 
role. First, it serves as a window into what statistical inference is all about 
for those taking only the probability part of the course. Second, it paints a 
fairly broad picture of the material discussed in considerable detail in the 
subsequent chapters. Accordingly and purposely, no specific results are stated, 
no examples are discussed, no exercises are included. All these things are done 
in the chapters following it. As already mentioned, the sole objective here is 
to take the reader through a brief orientation trip to statistical inference; to 
indicate why statistical inference is needed in the first place, how the relevant 
main problems are formulated, and how we go about resolving them. 

The second differentiating feature of the book is the relatively large 
number of examples discussed in detail. There are more than 220 such exam- 
ples, not including scores of numerical examples and applications. The first 
chapter alone is replete with 44 examples selected from a variety of applica- 
tions. Their purpose is to impress upon the student the breadth of applications 
of probability and statistics, to draw attention to the wide range of applica- 
tions where probabilistic and statistical questions are pertinent. At this stage, 
one could not possibly provide answers to the questions posed without the 
methodology developed in the subsequent chapters. Answers to these ques- 
tions are given in the form of examples and exercises throughout the remaining 
chapters. 

The book contains more than 560 exercises placed strategically at the ends 
of sections. The exercises are closely related to the material discussed in the 
respective sections, and they vary in the degree of difficulty. Detailed solutions 
to all of them are available in the form ofa Solutions Manual for the instructors 
of the course, when this textbook is used. Brief answers to even-numbered 
exercises are provided at the end of the book. Also included in the textbook 
are approximately 60 figures that help illustrate some concepts and operations. 

Still another desirable feature of this textbook is the effort made to mini- 
mize the so-called arm waving. This is done by providing a substantial number 
of proofs, without ever exceeding the mathematical prerequisites set. This 
also helps ameliorate the not so unusual phenomenon of insulting students’ 
intelligence by holding them incapable of following basic reasoning. 

Regardless of the effort made by the author of an introductory book in 
probability and statistics to cover the largest possible number of areas where 
probability and statistics apply, such a goal is unlikely to be attained. Conse- 
quently, no such textbook will ever satisfy students who focus exclusively on 
their own area of interest. It is also expected that this book will come as a 
disappointment to students who are oriented more toward vocational training 
rather than college or university education. This book is not meant to codify 
answers to questions in the form of framed formulas and prescription recipes. 
Rather, its purpose is to introduce the student to a thinking process and guide 
her or him toward the answer sought to a posed question. To paraphrase a 
Chinese saying, if you are taught how to fish, you eat all the time, whereas if 
you are given a fish, you eat only once. 


Preface xiii 


On several occasions the reader is referred for proofs and more comprehen- 
sive treatment of some topics to the book A Course in Mathematical Statis- 
tics, 2” edition (1997), Academic Press, by G.G. Roussas. This reference book 
was originally written for the same audience as that of the present book. How- 
ever, circumstances dictated the adjustment of the level of the reference book 
to match the mathematical preparation of the anticipated audience. 

On the practical side, a number of points of information are given here. 
Thus, logx (logarithm of x), whenever it occurs, is always the natural logarithm 
of x (the logarithm of x with base e), whether it is explicitly stated or not. 

The rule followed in the use of decimal numbers is that we retain three 
decimal digits, the last of which is rounded up to the next higher number, if 
the fourth omitted decimal is greater or equal 5. An exemption to this rule is 
made when the division is exact, and also when the numbers are read out of 
tables. The book is supplied with an appendix consisting of excerpts of tables: 
Binomial tables, Poisson tables, Normal tables, t-tables, Chi-Square tables, 
and F-tables. The last table, Table 7, consists of a list of certain often-occurring 
distributions along with some of their characteristics. The appendix is followed 
by a list of some notation and abbreviations extensively used throughout the 
book, and the body of the book is concluded with brief answers to the even- 
numbered exercises. 

In closing, a concerted effort has been made to minimize the number of 
inevitable misprints and oversights in the book. We have no illusion, however, 
that the book is free of them. This author would greatly appreciate being 
informed of any errors; such errors will be corrected in a subsequent printing 
of the book. 
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Some Motivating 
Examples and Some 
Fundamental 
Concepts 


This chapter consists of three sections. The first section is devoted to present- 
ing a number of examples (25 to be precise), drawn from a broad spectrum 
of human activities. Their purpose is to demonstrate the wide applicability of 
probability and statistics. In the formulation of these examples, certain terms, 
such as at random, average, data fit by a line, event, probability (estimated 
probability, probability model), rate of success, sample, and sampling (sample 
size), are used. These terms are presently to be understood in their everyday 
sense, and will be defined precisely later on. 

In the second section, some basic terminology and fundamental quantities 
are introduced and are illustrated by means of examples. In the closing section, 
the concept of a random variable is defined and is clarified through a number 
of examples. 


| 1.1 Some Motivating Examples 


In a certain state of the Union, n landfills are classified according to their 
concentration of three hazardous chemicals: arsenic, barium, and mercury. 
Suppose that the concentration of each one of the three chemicals is charac- 
terized as either high or low. Then some of the questions which can be posed 
are as follows: (i) If a landfill is chosen at random from among the n, what 
is the probability it is of a specific configuration? In particular, what is the 
probability that it has: (a) High concentration of barium? (b) High concentra- 
tion of mercury and low concentration of both arsenic and barium? (c) High 
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concentration of any two of the chemicals and low concentration of the third? 
(d) High concentration of any one of the chemicals and low concentration of 
the other two? (ii) How can one check whether the proportions of the landfills 
falling into each one of the eight possible configurations (regarding the levels 
of concentration) agree with a priori stipulated numbers? 


Suppose a disease is present in 100p,% (0 < pı < 1) of a population. A diag- 
nostic test is available but is yet to be perfected. The test shows 100p2% false 
positives (0 < pa < 1) and 100p3% false negatives (0 < p3 < 1). That is, for a 
patient not having the disease, the test shows positive (+) with probability pa 
and negative (—) with probability 1 — p2. For a patient having the disease, the 
test shows “—” with probability ps and “+” with probability 1 — p3. A person is 
chosen at random from the target population, and let D be the event that the 
person is diseased and N be the event that the person is not diseased. Then, it 
is clear that some important questions are as follows: In terms of pı, p2, and 
ps: (1) Determine the probabilities of the following configurations: D and +, 
D and —, N and +, N and —. (ii) Also, determine the probability that a person 
will test + or the probability the person will test —. (iii) If the person chosen 
tests +, what is the probability that he/she is diseased? What is the probability 
that he/she is diseased, if the person tests —? 


In the circuit drawn below, suppose that switch? = 1, ..., 5turns on with prob- 
ability p; and independently of the remaining switches. What is the probability 
of having current transferred from point A to point B? 

















A travel insurance policy pays $1,000 to a customer in case of a loss due to 
theft or damage on a 5-day trip. If the risk of such a loss is assessed to be 1 in 
200, what is a fair premium for this policy? 


Jones claims to have extrasensory perception (ESP). In order to test the claim, 
apsychologist shows Jones five cards that carry different pictures. Then Jones 
is blindfolded and the psychologist selects one card and asks Jones to identify 
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the picture. This process is repeated n times. Suppose, in reality, that Jones 
has no ESP but responds by sheer guesses. 


(i) Decide on a suitable probability model describing the number of correct 
responses. (ii) What is the probability that at most n/5 responses are correct? 
(iii) What is the probability that at least n/2 responses are correct? 


A government agency wishes to assess the prevailing rate of unemployment 
in a particular county. It is felt that this assessment can be done quickly and 
effectively by sampling a small fraction n, say, of the labor force in the county. 
The obvious questions to be considered here are: (i) What is a suitable prob- 
ability model describing the number of unemployed? (ii) What is an estimate 
of the rate of unemployment? 


Suppose that, for a particular cancer, chemotherapy provides a 5-year survival 
rate of 80% if the disease could be detected at an early stage. Suppose further 
that n patients, diagnosed to have this form of cancer at an early stage, are just 
starting the chemotherapy. Finally, let X be the number of patients among the 
n who survive 5 years. 


Then the following are some of the relevant questions which can be asked: 
(i) What are the possible values of X, and what are the probabilities that each 
one of these values is taken on? (ii) What is the probability that X takes values 
between two specified numbers a and b, say? (iii) What is the average number 
of patients to survive 5 years, and what is the variation around this average? 


An advertisement manager for a radio station claims that over 100p% (0 < p < 
1) of all young adults in the city listen to a weekend music program. To establish 
this conjecture, a random sample of size n is taken from among the target 
population and those who listen to the weekend music program are counted. 


(i) Decide on a suitable probability model describing the number of young 
adults who listen to the weekend music program. (ii) On the basis of the 
collected data, check whether the claim made is supported or not. (iii) How 
large a sample size n should be taken to ensure that the estimated average and 
the true proportion do not differ in absolute value by more than a specified 
number with prescribed (high) probability? 


When the output of a production process is stable at an acceptable standard, 
it is said to be “in control.” Suppose that a production process has been in 
control for some time and that the proportion of defectives has been p. As 
a means of monitoring the process, the production staff will sample n items. 
Occurrence of k or more defectives will be considered strong evidence for “out 
of control.” 


(i) Decide on a suitable probability model describing the number X of defec- 
tives; what are the possible values of X, and what is the probability that each of 
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these values is taken on? (ii) On the basis of the data collected, check whether 
or not the process is out of control. (iii) How large a sample size n should be 
taken to ensure that the estimated proportion of defectives will not differ in 
absolute value from the true proportion of defectives by more than a specified 
quantity with prescribed (high) probability? 


An electronic scanner is believed to be more efficient in determining flaws in 
a material than a mechanical testing method which detects 100p% (0 < p < 1) 
of the flawed specimens. To determine its success rate, n specimens with 
flaws are tested by the electronic scanner. 


(i) Decide on a suitable probability model describing the number X of the 
flawed specimens correctly detected by the electronic scanner; what are the 
possible values of X, and what is the probability that each one of these values 
is taken on? (ii) Suppose that the electronic scanner detects correctly k out of 
n flawed specimens. Check whether or not the rate of success of the electronic 
scanner is higher than that of the mechanical device. 


At a given road intersection, suppose that X is the number of cars passing by 
until an observer spots a particular make of a car (e.g., a Mercedes). 


Then some of the questions one may ask are as follows: (i) What are the 
possible values of X? (ii) What is the probability that each one of these values 
is taken on? (iii) How many cars would the observer expect to observe until 
the first Mercedes appears? 


A city health department wishes to determine whether the mean bacteria count 
per unit volume of water at a lake beach is within the safety level of 200. A 
researcher collected n water samples of unit volume and recorded the bacteria 
counts. 


Relevant questions here are: (i) What is the appropriate probability model 
describing the number X of bacteria in a unit volume of water; what are the 
possible values of X, and what is the probability that each one of these values is 
taken on? (ii) Do the data collected indicate that there is no cause for concern? 


Consider an aptitude test administered to aircraft pilot trainees, which requires 
a series of operations to be performed in quick succession. 


Relevant questions here are: (i) What is the appropriate probability model for 
the time required to complete the test? (ii) What is the probability that the test 
is completed in no less than tı minutes, say? (iii) What is the percentage of 
candidates passing the test, if the test is to be completed within tə minutes, say? 


Measurements of the acidity (pH) of rain samples were recorded at n sites in 
an industrial region. 
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(i) Decide on a suitable probability model describing the number X of the 
acidity of rain measured. (ii) On the basis of the measurements taken, provide 
an estimate of the average acidity of rain in that region. 


To study the growth of pine trees at an early state, a nursery worker records n 
measurements of the heights of 1-year-old red pine seedlings. 


(i) Decide on a suitable probability model describing the heights X of the pine 
seedlings. (ii) On the basis of the n measurements taken, determine average 
height of the pine seedlings. (iii) Also, check whether these measurements 
support the stipulation that the average height is a specified number. 


Itis claimed that a new treatment is more effective than the standard treatment 
for prolonging the lives of terminal cancer patients. The standard treatment 
has been in use for a long time, and from records in medical journals the mean 
survival period is known to have a certain numerical value (in years). The 
new treatment is administered to n patients, and their duration of survival is 
recorded. 


(i) Decide on suitable probability models describing the survival times X and 
Y under the old and the new treatments, respectively. (ii) On the basis of the 
existing journal information and the data gathered, check whether or not the 
claim made is supported. 


A medical researcher wishes to determine whether a pill has the undesirable 
side effect of reducing the blood pressure of the user. The study requires 
recording the initial blood pressures of n college-age women. After the use of 
the pill regularly for 6 months, their blood pressures are again recorded. 


(i) Decide on suitable probability models describing the blood pressures, ini- 
tially and after the 6-month period. (ii) Do the observed data support the claim 
that the use of the pill reduces blood pressure? 


It is known that human blood is classified in four types denoted by A, B, AB, 
and O. Suppose that the blood of n persons who have volunteered to donate 
blood at a plasma center has been classified in these four categories. Then a 
number of questions can be posed; some of them are: 


(i) What is the appropriate probability model to describe the distribution of 
the blood types of the n persons into the four types? (ii) What is the esti- 
mated probability that a person, chosen at random from among the n, has 
a specified blood type (e.g., O)? (iii) What are the proportions of the n per- 
sons falling into each one of the four categories? (iv) How can one check 
whether the observed proportions are in agreement with a priori stipulated 
numbers? 
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The following record shows a classification of 41,208 births in Wisconsin 
(courtesy of Professor Jerome Klotz). Set up a suitable probability model and 
check whether or not the births are uniformly distributed over all 12 months 
of the year. 


Jan. 3,478 July 3,476 
Feb. 3,333 Aug. 3,495 
March — 3,771 Sept. 3,490 
April 3,542 Oct. 3,331 
May 3,479 Nov. 3,188 
June 3,304 Dec. 3,321 


Total 41,208 


To compare the effectiveness of two diets A and B, 150 infants were included 
in a study. Diet A was given to 80 randomly selected infants and diet B was 
given to the other 70 infants. At a later time, the health of each infant was 
observed and classified into one of the three categories: “excellent,” “average,” 
and “poor.” The frequency counts are tabulated as follows: 


HEALTH UNDER TWO DIFFERENT DIETS 
Excellent Average Poor Sample Size 


Diet A 37 24 19 80 
Diet B 17 33 20 70 
Total 54 57 39 150 


Set up a suitable probability model for this situation, and, on the basis of the 
observed data, compare the effectiveness of the two diets. 


Osteoporosis (loss of bone minerals) is a common cause of broken bones in 
the elderly. A researcher on aging conjectures that bone mineral loss can be 
reduced by regular physical therapy or by certain kinds of physical activity. A 
study is conducted on n elderly subjects of approximately the same age divided 
into control, physical therapy, and physical activity groups. After a suitable 
period of time, the nature of change in bone mineral content is observed. 


Set up a suitable probability model for the situation under consideration, and 
check whether or not the observed data indicate that the change in bone 
mineral varies for different groups. 


CHANGE IN BONE MINERAL 
Appreciable Little Appreciable 


Loss Change Increase Total 
Control 38 15 7 60 
Therapy 22 32 16 70 
Activity 15 30 25 70 


Total 75 77 48 200 
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In the following table, the data x = undergraduate GPA and y = score in the 
Graduate Management Aptitude Test (GMAT) are recorded. 


DATA OF UNDERGRADUATE GPA (x) 
AND GMAT SCORE (y) 


x y x y x y 
3.63 447 2.36 399 2.80 444 
3.59 588 2.36 482 3.13 416 
3.80 563 2.66 420 3.01 471 
3.40 553 2.68 414 2.79 490 
3.50 572 2.48 533 2.89 431 
3.78 591 2.46 509 2.91 446 
3.44 692 2.63 504 2.75 546 
3.48 528 2.44 336 2.73 467 
3.47 552 2.13 408 3.12 463 
3.85 520 2.41 469 3.08 440 
3.39 543 2.55 538 3.03 419 

3.00 509 


(i) Draw a scatter plot of the pairs (x, y). (ii) On the basis of part (i), set up 
a reasonable model for the representation of the pairs (x, y). (iii) Indicate 
roughly how this model can be used to predict a GMAT score on the basis of 
the corresponding GPA score. 


In an experiment designed to determine the relationship between the doses 
of a compost fertilizer x and the yield y of a crop, n values of x and y are 
observed. On the basis of prior experience, it is reasonable to assume that the 
pairs (x, y) are fitted by a straight line, which can be determined by certain 
summary values of the data. Later on, it will be seen how this is specifically 
done and also how this model can be used for various purposes, including that 
of predicting a value of y on the basis of a given value of x. 


In an effort to improve the quality of recording tapes, the effects of four kinds 
of coatings A, B, C, and D on the reproducing quality of sound are compared. 
Twenty two measurements of sound distortions are given in the following table. 


SOUND DISTORTIONS OBTAINED 
WITH FOUR TYPES OF COATINGS 


Coating Observations 
A 10, 15, 8, 12, 15 
B 14, 18, 21, 15 
C 17, 16, 14, 15, 17, 15, 18 
D 12, 15, 17, 15, 16, 15 


In connection with these data, several questions may be posed (and will be 
posed later on). The most immediate of them all is the question of whether 
or not the data support the existence of any significant difference among the 
average distortions obtained using the four coatings. 


8 Chapter 1 Some Motivating Examples and Some Fundamental Concepts 


[EXAMPLE 25 | Charles Darwin performed an experiment to determine whether self-fertilized 
and cross-fertilized plants have different growth rates. Pairs of Zea mays 


plants, one self- and the other cross-fertilized, were planted in pots, and their 
heights were measured after a specified period of time. The data Darwin ob- 
tained were: 


PLANT HEIGHT (IN 1/8 INCHES) 


Pair Cross- Self- Pair Cross- Self- 

1 188 139 9 146 132 
2 96 163 10 173 144 
3 168 160 11 186 130 
4 176 160 12 168 144 
5 153 147 13 177 102 
6 172 149 14 184 124 
7 177 149 15 96 144 
8 163 122 


Source: Darwin, C., “The Effects of Cross- and Self-Fertilization 
in the Vegetable Kingdom,” D. Appleton and Co., New York, 1902. 


These data lead to many questions, the most immediate being whether cross- 
fertilized plants have a higher growth rate than self-fertilized plants. This ex- 
ample will be revisited later on. 


| 1.2 Some Fundamental Concepts 


One of the most basic concepts in probability and statistics is that of arandom 
experiment. Although a more precise definition is possible, we will restrict 
ourselves here to understanding a random experiment as a procedure which 
is carried out under a certain set of conditions; it can be repeated any number 
of times under the same set of conditions, and upon the completion of the 
procedure certain results are observed. The results obtained are denoted by s 
and are called sample points. The set of all possible sample points is denoted 
by S and is called a sample space. Subsets of S are called events and are 
denoted by capital letters A, B, C, etc. An event consisting of one sample point 
only, {s}, is called a simple event and composite otherwise. An event A occurs 
(or happens) if the outcome of the random experiment (that is, the sample 
point s) belongs in A, s € A; A does not occur (or does not happen) if s ¢ A. 
The event S always occurs and is called the swre or certain event. On the 
other hand, the event @ never happens and is called the impossible event. Of 
course, the relation A C B between two events A and B means that the event 
B occurs whenever A does, but not necessarily the opposite. (See Figure 1.1 
for the Venn diagram depicting the relation A C B.) The events A and B are 
equal if both AC Band BC A. 

Some random experiments are given in the following along with corre- 
sponding sample spaces and some events. 


| EXAMPLE 28 
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Figure 1.1 


AC B; in Fact, 
A C B, Because 
s2 € B, But sz Z A 














Tossing three distinct coins once. 


Then, with A and T standing for “heads” and “tails,” respectively, a sample 
space is: 


S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}. 
The event A = “no more than 1 H occurs” is given by: 
A = {TTT, HTT, THT, TTH}. 
Rolling once two distinct dice. 
Then a sample space is: 
S = {d, 13, d, 2),..., C, 6), ..., (6, 1), (6, 2), ..., (6, ©}, 
and the event B = “the sum of numbers on the upper faces is < 5” is: 
B= {d, D, 4,3, a, 3), 0, 9, C, D, @, 2), 2, 3), 8, D, 6, 2), (4, DI- 


Drawing a card from a well-shuffled standard deck of 52 cards. Denoting by 
C, D, H, and S clubs, diamonds, hearts, and spades, respectively, by J, Q, K 
Jack, Queen, and King, and using 1 for aces, the sample space is given by: 


Sc lern a CO Üo erga O Ko): 


An event A may be described by: A = “red and face card,” so that 
A= {Jp, Jn, Qp, Qu, Kp, Kn}. 


Drawing (without replacement) two balls from an urn containing m numbered 
black balls and n numbered red balls. 
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Then, in obvious notation, a sample space here is: 


S = {byb2,..., DiBm ---, Ombi, -.., Bmbm—1; 


bin, Tee bin, as bmn, ay ailing brins 
ribi, sey 110m, ...3 Tb1, Cea | Tnbdm, 
A TU eng nl irens nnal 


An event A may be the following: A = “the sum of the numbers on the balls 
does not exceed 4.” Then 


A = (b1b2, b1b3, b2b1, b3b1, b11, D12, b173, 
bar, bara, b31, 11b1, 7102, r1b3, 7201, 


robo, 7301, Y1Y2, Y1Y3, 7271, 7371} (assuming that m, n > 3). 


L EXAMPIE30 | Recording the gender of children of two-children families. 


With b and g standing for boy and girl, and with the first letter on the left 
denoting the older child, a sample space is: S = {bb, bg, gb, gg}. An event B 
may be: B = “children of both genders.” Then B = {bg, gb}. 


L EXAMPIE3I Ranking five horses in a horse race. 


Then the suitable sample space S consists of 120 sample points, corresponding 
to the 120 permutations of the numbers 1, 2, 3, 4, 5. (We exclude ties.) The event 
A = “horse #3 comes second” consists of the 24 sample points, where 3 always 
occurs in the second place. 


(¡SITE Tossing a coin repeatedly until H appears for the first time. 


The suitable sample space here is: 
S = {H, TH, TTH, ..., TT...TH,.... 
Then the event A = “the 1st H does not occur before the 10th tossing” is given 
by: 
AS1T. TH, Ti 
=_-— KS — 
9 10 


L EXAMPLE 33_| Recording the number of telephone calls served by a certain telephone ex- 


change center within a specified period of time. 


Clearly, the sample space here is: S = {0,1,...,C}, where C is a suitably 
large number associated with the capacity of the center. For mathematical 
convenience, we often take S to consist of all nonnegative integers; that is, 
S = {0,1,...}. 


| EXAMPLE 38 


| EXAMPLE 40 
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Recording the number of traffic accidents which occurred in a specified loca- 
tion within a certain period of time. 


As in the previous example, S = {0, 1,..., M} for a suitable number M. If M 
is sufficiently large, then S is taken to be: S = (0, 1, ...}. 


Recording the number of particles emitted by a certain radioactive source 
within a specified period of time. 


As in the previous two examples, S is taken to be: S = {0, 1,..., M}, where M 
is often a large number, and then as before S is modified to be: S = {0, 1, ...}. 


Recording the lifetime of an electronic device, or of an electrical appliance, 
etc. 


Here S is the interval (0, T) for some reasonable value of T; that is, S = (0, T). 
Sometimes, for justifiable reasons, we take, S = (0, 00). 


Recording the distance from the bull’s eye of the point where a dart, aiming at 
the bull's eye, actually hits the plane. Here it is clear that S = (0, oo). 


Measuring the dosage of a certain medication, administered to a patient, until 
a positive reaction is observed. 


Here S = (0, D) for some suitable D (not rendering the medication lethal!). 
Recording the yearly income of a target population. 


If the incomes are measured in $ and cents, the outcomes are fractional num- 
bers in an interval [0, M] for some reasonable M. Again, for reasons similar to 
those cited in Example 36, S is often taken to be S = [0, oo). 


Waiting until the time the Dow-Jones Industrial Average index reaches or 
surpasses a specified level. 


Here, with reasonable qualifications, we may chose to take S = (0, 00). 


Examples 1-25, suitably interpreted, may also serve as further illustrations of 
random experiments. All examples described previously will be revisited on 
various occasions. 


For instance, in Example 1 and in self-explanatory notation, a suitable sample 
space is: 


S={AnBrMn, AnBhMe, AnBeMn, AcBrMn, AnB¿Me, 
A¿B,M,, AcByMy, AcBe Mz}. 


12 


Chapter 1 Some Motivating Examples and Some Fundamental Concepts 


Then the events A = “no chemical occurs at high level” and B = “at least two 
chemicals occur at high levels” are given by: 


A= {Ae BeM;}, B=(A,B,M;, AnBeMn, AeBhMn, AnB Mi). 


In Example 2, a patient is classified according to the result of the test, giving 
rise to the following sample space: 


S = {D+, D—, N+, N-}, 


where D and N stand for the events “patient has the disease” and “patient 
does not have the disease,” respectively. Then the event A = “false diagnosis 
of test” is given by: A = {D—, N+}. 


In Example 5, the suitable probability model is the so-called binomial model. 
The sample space S is the set of 2” points, each point consisting of a sequence 
ofn S’s and F’s, S standing for success (on behalf of Jones) and F standing for 
failure. Then the questions posed can be answered easily. 


Examples 6 through 10 can be discussed in the same framework as that of 
Example 5 with obvious modifications in notation. 


In Example 11, a suitable sample space is: 
S={M, MM, M°M°M,..., M°---M°M,...}, 


where M stands for the passing by of a Mercedes car. Then the events A and 
B, where A = “Mercedes was the 5th car passed by” and B = “Mercedes was 
spotted after the first 3 cars passed by” are given by: 


A={M°M°M°M°M} and B={M°M°M°M, MM MMM, ...). 


In Example 12, a suitable sample space is: S = {0, 1, ..., M} for an appropri- 
ately large (integer) M; for mathematical convenience, S is often taken to be: 
S ={0)1,2,....:}. 


In Example 13, a suitable sample space is: S = (0, T) for some reasonable 
value of T. In such cases, if T is very large, mathematical convenience dictates 
replacement of the previous sample space by: S = (0, 00). 


Examples 14 and 15 can be treated in the same framework as Example 13 with 
obvious modifications in notation. 


In Example 18, a suitable sample space S is the set of 4” points, each point 
consisting of a sequence of n symbols A, B, AB, and O. The underlying prob- 
ability model is the so-called multinomial model, and the questions posed can 
be discussed by available methodology. Actually, there is no need even to refer 
to the sample space S. All one has to do is to consider the outcomes in the n 
trials and then classify the n outcomes into four categories A, B, AB, and O. 
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Example 19 fits into the same framework as that of Example 18. Here the 
suitable S consists of 1211208 points, each point being a sequence of symbols 
representing the 12 months. As in the previous example, there is no need, 
however, even to refer to this sample space. Example 20is also of the same type. 


In many cases, questions posed can be discussed without reference to any 
explicit sample space. This is the case, for instance, in Examples 16-17 and 
21-25. 


In the examples discussed previously, we have seen sample spaces consisting 
of finitely many sample points (Examples 26-31), sample spaces consisting of 
countably infinite many points (for example, as many as the positive integers) 
(Example 32 and also Examples 33-35 if we replace C and M by oo for mathe- 
matical convenience), and sample spaces consisting of as many sample points 
as there are in a nondegenerate finite or infinite interval in the real line, which 
interval may also be the entire real line (Examples 36-40). Sample spaces with 
countably many points (i.e., either finitely many or countably infinite many) 
are referred to as discrete sample spaces. Sample spaces with sample points 
as many as the numbers in a nondegenerate finite or infinite interval in the real 
line N = (oo, 00) are referred to as continuous sample spaces. 


Returning now to events, when one is dealing with them, one may perform 
the same operations as those with sets. Thus, the complement of the event A, 
denoted by A", is the event defined by: A° = {s € S; s ¢ A}. The event A" is 
presented by the Venn diagram in Figure 1.2. So A* occurs whenever A does 
not, and vice versa. 


Figure 1.2 


“ Is the Shade 7 
Region Shaded H 
AC 
LIDIZZZÓ, Yh 


The union of the events Aj, ..., An, denoted by A; U...U Ay, or Uf Aj, is the 
event defined by Ulsa Aj = {s € S; s € Aj, for at least one j = 1,...,n).So 
the event Uj- , Aj occurs whenever at least one of A;, 7 = 1, ..., noccurs. For 
n = 2, A; U A is presented in Figure 1.3. The definition extends to an infinite 
number of events. Thus, for countably infinite many events A;, j = 1,2,..., 
one has UU, Aj = {s € S; s € Aj, for at least one j = 1, 2, ...}. 











The intersection of the events A;, j = 1,...,n is the event denoted by 
AN --- MA, or Mh- Aj and is defined by N} A; = {s € S;s € Aj, for 
all j = 1,...,n}. Thus, Pi Aj occurs whenever all Aj, J = 1,...,n 
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Figure 1.3 


Ay U As Is the 
Shaded Region 














Figure 1.4 


A, N Ag Is the 
Shaded Region 














occur simultaneously. For n = 2, Aj N Az is presented in Figure 1.4. This 
definition extends to an infinite number of events. Thus, for countably infi- 
nite many events A;, j = 1, 2,..., one has Pa Aj = {s € S;s € Aj, for all 
J= 2 ech 


If Ay N Az = Ø, the events A; and A are called disjoint (see Figure 1.5). 
The events Aj, j = 1, 2, ..., are said to be mutually or pairwise disjoint, if 
A; N A; = Ø whenever i F j. 


Figure 1.5 


Aj and Az Are 
Disjoint; That Is 
A; Aj = Ø 














The differences A, — Az and A> — A; are the events defined by A; — Az = 
{s € S; s € Aj, S ¢ Az), Az — Ay = {s € S; s € Az, s E Aj) (see Figure 1.6). 


From the definition of the preceding operations, the following properties fol- 
low immediately, and they are listed here for reference. 


1. S° = Ø, Ø = S, (AY = A. 
2. SUA=S,DUA=A, AUA =S, AUA= A. 
3. SNA=4A,ØNA=Ø, ANA =Ø, ANA=A. 
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Figure 1.6 


Ay = Az Is M, 
Az — A, Is \\ 














The previous statements are all obvious, as is the following: Ø C A for every 
event A in S. Also, 


4. A; U (42 U As) = (Ay U Ay) U As 
A, N (A2 N A3) = (A, N A2) N A3 
5. Ay U Ap = As U Ay 
ALN Ap = AN Ay 
6. AN (U;A;z) = Uj(AN Aj) 
AU (N¡Aj) = Nj(AU Aj) | 


| (associative laws) 
(commutative laws) 


(distributive laws) 


In the last relations, as well as elsewhere, when the range of the index 7 is 
not indicated explicitly, it is assumed to be a finite set, such as (1, ..., n}, ora 
countably infinite set, such as (1, 2, ...}. 


For the purpose of demonstrating some of the set-theoretic operations just 
defined, let us consider some further concrete examples. 


L EXAMPLE 4L Consider the sample space S = {S1, Sa, S3, S4, S5, S6, $7, S8} and define the events 


Aj, Ag, and Az as follows: Aj = {s1, S2, S3}, Az = {S2, Ss, Sa, S5}, As = (83, Sa, 
S5, S8}. Then observe that: 


Aj = (Sa, Ss, S6, 87, Ss), A5 = {S1, S6, S7, Sg), AS = {S1, S2, S6, S7}; 
A, U Az = {S1, S2, S3, S4, $5), A] U Az = (81, S2, S3, S4, S5, S8}, 

Az U Az = {S2, S3, S4, S5, Sg}, Ay U Ag U Az = {S1, S2, S3, S4, S5, S8}; 
A, M Ag = (s»,s3), A1 N Ag = {s3}, A1 N Ag N Ag = {s3}; 

A; — A = {Si}, A2 — Ai = {S4, $5}, Al — Az = {s1, sa), 

Az — Aj = {S4, S5, Sg}, Ag — Az = {s2}, Ag — Az = {ss}; 

CAD" = {s1, S2, ss =AD, (AS) = [s2, $3, S4, S5}(=42), 

(AS)° = {83, Sa, S5, S8}(=43). 


An identity and DeMorgan's laws stated subsequently are of significant impor- 
tance. Their justifications are left as exercises (see Exercises 2.14 and 2.15). 


An identity U; Aj = A¡U(A¡N A2) U (AL AS Ag) U... 


L EXAMPLE 42 | From Example 41, we have: 


Ay = {51, $2, Sa), Aj N As = {84, $5}, Aj N A5 N Az = {ss}, 
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Note that A), A{ Az, A7N 45 N 43 are pairwise disjoint. Now A; U (ASM Az) U 
(41 N 43 N A3) = {S1, S2, $3, S4, S5, S8}, Which is equal to A; U A2 U Ag; that is, 


Ay U A2 U Ag = Ay U (APM A2) U (AÑ N AȘ N As) 


as the preceding identity states. 


The significance of the identity is that the events on the right-hand side are 
pairwise disjoint, whereas the original events A;, j > 1, need not be so. 


DeMorgan’s laws (U;jA;) = 9;A%, NjA = Uj AS. 


L EXAMPLE 43 _| Again from Example 41, one has: 


(A; U A2)° = {S6, S7, Sg}, AN AS = [Sg, 87, Ss}; 
(A, U Az U As) = [sg, S7}, Az N As N AS = {S6, S7); 
(A1 N Az)” = {S1, S4, S5, S6, 87, 88}, AY U AS = [S1, S4, S5, S6, S7, S8}; 
(A1 N A2 N Az)” = {S1, Sa, S4, S5, S6, S7, S8}, 
AS U AS U AS = {S1, S2, S4, S5, S6, S7, Ss}, 
so that 
(A, U A) = ANAS, (A, U A2U Ag)” = AN ASN AS, as DeMorgan’s 
(A, N 42) = APU AS, (ALN A2 N A3) = AYU ASU AS, laws state. 


As a further demonstration of how complements, unions, and intersections of 
sets are used for the expression of new sets, consider the following example. 


In terms of the events Ar, As, and A3 (in some sample space S) and, perhaps, 
their complements, unions, and intersections, express the following events: 
D; = “A; does not occur,” i = 1, 2, 3, so that Dı = Aj, D2 = 45, Ds = As; 
E = “all A;, A2, A3 occur,” so that E = A, N 42 N 43; 
F = “none of Aj, A2, Az occurs,” so that F = ALN 45 N A5; 
G = “at least one of Aj, Az, A3 occurs,” so that G = A; U Ag U A3; 
H = “exactly two of Aj, Ag, Az occur,” so that H = (A, N A2 N AS) U 
(41 N 45N As) U (4 N A2 N A3); 
I = “exactly one of Ay, Az, Az occurs,” so that I = (A, N A$ N AS) U 
(43 N A2 N 45) U (41 N 45 As). 
It also follows that: 
G = “exactly one of A1, A2, Az occurs” U “exactly two of A;, Az, Az occur” U 
“all A1, A2, Az occur” 
=IUHUE. 


This section is concluded with the concept of a monotone sequence of events. 
Namely, the sequence of events {An}, n > 1, is said to be monotone, if either 
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A; C A C ... (increasing) or Ay > Az > ... (decreasing). In case of an 
increasing sequence, the union Uj A; is called the limit of the sequence, and 
in case of a decreasing sequence, the intersection Pa A; is called its limit. 


The concept of the limit is also defined, under certain conditions, for non- 
monotone sequences of events, but we are not going to enter into it here. The 
interested reader is referred to Definition 1, page 5, of the book A Course in 
Mathematical Statistics, 2nd edition (1997), Academic Press, by G. G. Roussas. 


2.1 An airport limousine departs from a certain airport with three passengers 
to be delivered in any one of three hotels denoted by Ay, Ho, H3. Let 
(4, %2, x3) denote the number of passengers left at hotels Hı, Ha, and 
Hz, respectively. 

(i) Write out the sample space S of all possible deliveries. 
(ii) Consider the events A, B, C, and D, defined as follows, and express 
them in terms of sample points. 
A = “one passenger in each hotel,” 
B = “all passengers in Hj,” 
C = “all passengers in one hotel,” 
D = “at least two passengers in H4,” 
E = “fewer passengers in H; than in any one of Hz or H3.” 


2.2 Amachine dispenses balls which are either red or black or green. Suppose 
we operate the machine three successive times and record the color of 
the balls dispensed, to be denoted by r, b, and g for the respective colors. 
(i) Write out an appropriate sample space S for this experiment. 

(ii) Consider the events A, B, and C, defined as follows, and express 
them by means of sample points. 
A = “all three colors appear,” 
B = “only two colors appear,” 
C = “at least two colors appear.” 


2.3 A university library has five copies of a textbook to be used in a certain 
class. Of these copies, numbers 1 through 3 are of the 1st edition, and 
numbers 4 and 5 are of the 2nd edition. Two of these copies are chosen 
at random to be placed on a 2-hour reserve. 

(i) Write out an appropriate sample space S. 
(ii) Consider the events A, B, C, and D, defined as follows, and express 
them in terms of sample points. 
A = “both books are of the 1st edition,” 
B = “both books are of the 2nd edition,” 
C = “one book of each edition,” 
D = “no book is of the 2nd edition.” 
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2.4 A large automobile dealership sells three brands of American cars, de- 
noted by a;, dz, az; two brands of Asian cars, denoted by b;, b2; and one 
brand of a European car, denoted by c. We observe the cars sold in two 
consecutive sales. Then: 

(i) Write out an appropriate sample space for this experiment. 
(11) Express the events defined as follows in terms of sample points: 


A = “American brands in both sales,” 


B = “American brand in the first sale and Asian brand in the second 
sale,” 


C = “American brand in one sale and Asian brand in the other sale,” 
D = “European brand in one sale and Asian brand in the other sale.” 


2.5 Of two gas stations I and II located at a certain intersection, I has five gas 
pumps and II has six gas pumps. On a given time of a day, observe the 
numbers x and y of pumps in use in stations I and II, respectively. 

(i) Write out the sample space S for this experiment. 
(ii) Consider the events A, B, C, and D, defined as follows, and express 
them in terms of sample points. 


A = “three pumps are in use in station I,” 
B = “the number of pumps in use in both stations is the same,” 


C = “the number of pumps in use in station II is larger than that in 
station I,” 


D = “the total number of pumps in use in both stations is not greater 
than 4.” 


2.6 At a certain busy airport, denote by A, B, C, and D the events defined as 
follows: 
A = “at least 5 planes are waiting to land,” 
B = “at most 3 planes are waiting to land,” 
C = “at most 2 planes are waiting to land,” 
D = “exactly 2 planes are waiting to land.” 
In terms of the events A, B, C, and D and, perhaps, their complements, 
express the following events: 
E = “at most 4 planes are waiting to land,” 
F = “at most 1 plane is waiting to land,” 
G = “exactly 3 planes are waiting to land,” 
H = “exactly 4 planes are waiting to land,” 
I = “at least 4 planes are waiting to land.” 


2.7 Let S = ((x, y) € R? -3 < x < 3,0 < y < 4, x and y integers}, and 
define the events A, B, C, and D as follows: 


A=((x1,yeSix=Yy, B=((x,y)eS;x=-—y), 
C=((x,yeSja=y) D={@,yeS x+y < 5). 


List the members of the events just defined. 
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2.8 In terms of the events A;, A2, Az in a sample space S and, perhaps, their 
complements, express the following events: 
(i) Bo = {s € S; s belongs to none of Aj, A2, As), 

(ii) Bı = {s € S; s belongs to exactly one of Aj, A2, A3), 

(iii) By = {s € S; s belongs to exactly two of A;, A2, A3}, 

(iv) B; = {s € S; s belongs to all of Aj, Az, A3}, 

(v) C = {s € S; s belongs to at most two of Ay, A2, As}, 

(vi) D = {s € S; s belongs to at least one of Aj, Az, Az). 


2.9 If for three events A, B, and C it happens that either AU BUC = A or 
AN BNC = A, what conclusions can you draw? 


2.10 Show that A is the impossible event (that is, A = Ø), if and only if 
(AN B°)U (A°N B) = B for every event B. 


2.11 Let A, B, and C be arbitrary events in S. Determine whether each of the 
following statements is correct or incorrect. 
Gi) (A-B)UB=(AN B°))UB=B, 
Gi) (AUB)- A=(AUB)N A" =B, 
Gi) (AN B)N(A— B) = (AN B) NA (AN B’) = Ø, 
(iv) (AUB)N(BUC)N(CU A) = (AN B)U(BNC)U(CN A). 


2.12 For any three events A, B, and C in a sample space S show that the 
transitive property, A C Band B C C, implies that A € C holds. 


2.13 Establish the distributive laws, namely A N (U;A;) = U;(A N Aj) and 
AU (NjAj) =N¡(AU Aj). 


2.14 Establish the identity: 
Uj Aj = Ay U (4N Ag) U (AÑ N AȘ N As) U+- 
2.15 Establish DeMorgan’s laws, namely 
UA; = NAG and (NjA; = U; AG. 
2.16 Let S = hi and, for n = 1, 2,..., define the events A, and B, by: 


1 1 3 
Ay= {xem -5+1 <x <20-<}, By |remo<z<7+3}, 
n n n 


(i) Show that the sequence {An} is increasing and the sequence {Bp} is 
decreasing. 
(ii) Identify the limits, lim A, = US, An and lim B, = MiB 


| 1.3 Random Variables 


For every random experiment, there is at least one sample space appropri- 
ate for the random experiment under consideration. In many cases, however, 
much of the work can be done without reference to an explicit sample space. In- 
stead, what are used extensively are random variables and their distributions. 
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Those quantities will be studied extensively in subsequent chapters. What is 
done in this section is the introduction of the concept of a random variable. 

Formally, a random variable, to be shortened to r.v., is simply a function 
defined on a sample space S and taking values in the real line R = (—oo, 00). 
Random variables are denoted by capital letters, such as X, Y, Z, with or with- 
out subscripts. Thus, the value of the r.v. X at the sample point sis X(s), and 
the set of all values of X, that is, the range of X, is usually denoted by X(S). 
The only difference between a r.v. and a function in the usual calculus sense 
is that the domain of a r.v. is a sample space S, which may be an abstract set, 
unlike the usual concept of a function, whose domain is a subset of R or of a 
Euclidean space of higher dimension. The usage of the term “random variable” 
employed here rather than that of a function may be explained by the fact that 
ar.v. is associated with the outcomes of a random experiment. Thus, one may 
argue that X(s) is not known until the random experiment is actually carried 
out and s becomes available. Of course, on the same sample space, one may 
define many distinct r.v.'s. 

In reference to Example 26, instead of the sample space S exhibited there, 
one may be interested in the number of heads appearing each time the exper- 
iment is carried out. This leads to the definition of the r.v. X by: X(s) = # of 
H'’s in s. Thus, X(HHH) = 3, X(HHT) = X(ATH) = X(THH) = 2, X(HTT) = 
X(THT) = X(TTH) = 1, and X(TTT) = 0, so that X(S) = {0, 1, 2, 3}. The nota- 
tion (X < 1) stands for the event {s € S; X(s) < 1} = {TTT, HTT, THT, TTH}. 
In the general case and for B C 3, the notation (X e B) stands for the event 
A in the sample space S defined by: A = {s € S; X(s) e B}. It is also denoted 
by X~!(B). 

In reference to Example 27, ar.v. X of interest may be defined by X(s) = 
sum of the numbers in the pair s. Thus, X((1, 1)) = 2, X((, 2) = X(Q, 1) = 
3,..., X((6, 6)) = 12, and X(S) = {2,3,..., 12}. Also, X-'({(7) = {s € S; 
X(s) = 7} = {(1, 6), 2, 5), 6, 4), (4, 3), ©, 2), (6, 1}. Similarly for Examples 
28-31. 

In reference to Example 32, a natural r.v. X is defined to denote the num- 
ber of tosses needed until the first head occurs. Thus, X(H) = 1, X(TH) = 
2,...,X(T...TH) = n,..., so that X(S) = {1,2,...}. Also, (X > 5) = 


(X > 6) =(PTTTTH, TTTTTTH, ...}. 

In reference to Example 33, an obvious r.v. X is: X(s) = s, s = 0,1,..., 
and similarly for Examples 34-35. 

In reference to Example 36, a r.v. X of interest is X(s) = s, s € S, and 
similarly for Examples 37—40. 

Also, in reference to Example 5, an obvious r.v. X may be defined as fol- 
lows: X(s) = # of S’s in s. Then, clearly, X(S) = {0,1,..., n}. Similarly for 
Examples 6-10. 

In reference to Example 11, ar.v. X may be defined thus: X(s) = the position 
of M in s. Then, clearly, X(S) = {1, 2, ...}. 

In reference to Example 18, the r.v.’s of obvious interests are: X4 = # of 
those persons, out of n, having blood type A, and similarly for Xp, X,4p, Xo. 
Similarly for Examples 19 and 20. 
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From the preceding examples, two kinds of r.v.'s emerge: random vari- 
ables which take on countably many values, such as those defined in conjunc- 
tion with Examples 26-31 and 32-35, and r.v.’s which take on all values in a 
nondegenerate (finite or not) interval in R. Such are r.v.'s defined in conjunc- 
tion with Examples 36-40. Random variables of the former kind are called 
discrete r.v.'s (or r.v.'s of the discrete type), and r.v.'s of the latter type are 
called continuous r.v.'s (or r.v.'s of the continuous type). 

More generally, a r.v. X is called discrete (or of the discrete type), if X takes 
on countably many values; i.e., either finitely many values such as %, ..., Xp, 
or countably infinite many values such as %, X], ... OF Xx, %,.... On the other 
hand, X is called continuous (or of the continuous type) if X takes all values 
in a proper interval J C R. Although there are other kinds of r.v.’s, in this book 
we will restrict ourselves to discrete and continuous r.v.'s as just defined. 

The study of r.v.'s is one of the main objectives of this book. 


3.1 In reference to Exercise 2.1, define the rv.’s X;, i = 1,2,3 as follows: 
Xi = # of passengers delivered to hotel H;. 
Determine the values of each X;, i = 1, 2, 3, and specify the values of the 
sum Xi + Xə + X3. 


3.2 In reference to Exercise 2.2, define the r.v.’s X and Y as follows: X = # of 
red balls dispensed, Y = # of balls other than red dispensed. 
Determine the values of X and Y, and specify the values of the sum X +Y. 


3.3 In reference to Exercise 2.5, define the r.v.’s X and Y as follows: X = # of 
pumps in use in station I, Y = # of pumps in use in station II. 
Determine the values of X and Y, and also of the sum X + Y. 


3.4 In reference to Exercise 2.7, define the r.v. X by: X(x, Y)) = x + Y. 
Determine the values of X, as well as the following events: (X < 2), 
(8 < X <5), (X > 6). 


3.5 Consider a year with 365 days, which are numbered serially from 1 to 365. 
Ten of those numbers are chosen at random and without replacement, 
and let X be the r.v. denoting the largest number drawn. 

Determine the values of X. 


3.6 A four-sided die has the numbers 1 through 4 written on its sides, one on 
each side. If the die is rolled twice: 
(i) Write out a suitable sample space S. 
(ii) If X is the r.v. denoting the sum of numbers appearing, determine the 
values of X. 
(iii) Determine the events: (X < 3), (2 < X < 5), (X > 8). 


3.7 From a certain target population, n individuals are chosen at random 
and their blood types are determined. Let X1, X2, X3, and X4 be the r.v.'s 
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denoting the number of individuals having blood types A, B, AB, and O, 
respectively. 

Determine the values of each one of these r.v.'s, as well as the values of 
the sum Xy + X2 + Xs + X4. 


3.8 A bus is expected to arrive at a specified bus stop any time between 8:00 
and 8:15 a.m., and let X be the r.v. denoting the actual time of arrival of 
the bus. 

(i) Determine the suitable sample space S for the experiment of observ- 
ing the arrival of the bus. 
(ii) What are the values of the rv. X? 
(iii) Determine the event: “The bus arrives within 5 minutes before the 
expiration of the expected time of arrival.” 
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The Concept 
of Probability 
and Basic Results 


This chapter consists of five sections. The first section is devoted to the def- 
inition of the concept of probability. We start with the simplest case, where 
complete symmetry occurs, proceed with the definition by means of relative 
frequency, and conclude with the axiomatic definition of probability. The defin- 
ing properties of probability are illustrated by way of examples. Also, anumber 
of basic properties, resulting from the definition, are stated and justified. Some 
of them are illustrated by means of examples. The section is concluded with 
two theorems, which are stated but not proved. 

In the second section, the distribution of a r.v. is introduced. Also, the 
distribution function and the probability density function of a r.v. are defined, 
and we explain how they determine the distribution of the r.v. 

The concept of the conditional probability of an event, given another event, 
is taken up in the following section. Its definition is given, and its significance 
is demonstrated through a number of examples. This section is concluded 
with three theorems, formulated in terms of conditional probabilities. Through 
these theorems, conditional probabilities greatly simplify calculation of other- 
wise complicated probabilities. 

In the fourth section, the independence of two events is defined, and we 
also indicate how it carries over to any finite number of events. A result 
(Theorem 6) is stated which is often used by many authors without its use 
even being acknowledged. The section is concluded with an indication of how 
independence extends to random experiments. The definition of independence 
of r.v.'s is deferred to another chapter (Chapter 5). 

In the final section of the chapter, the so-called fundamental principle 
of counting is discussed; combinations and permutations are then obtained 
as applications of this principle. Several illustrative examples are also 
provided. 
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i 2.1 Definition of Probability and Some Basic Results 


When a random experiment is entertained, one of the first questions which 
arise is, what is the probability that a certain event occurs? For instance, in 
reference to Example 26 in Chapter 1, one may ask: What is the probability that 
exactly one head occurs; in other words, what is the probability of the event 
B={HTT, THT, TTH}? The answer to this question is almost automatic and 
is 3/8. The relevant reasoning goes like this: Assuming that the three coins are 
balanced, the probability of each one of the 8 outcomes, considered as simple 
events, must be 1/8. Since the event B consists of 3 sample points, it can occur 
in 3 different ways, and hence its probability must be 3/8. 

This is exactly the intuitive reasoning employed in defining the concept 
of probability when two requirements are met: First, the sample space S has 
finitely many outcomes, S = (s¡,..., Sy), Say, and second, each one of these 
outcomes is “equally likely” to occur, has the same chance of appearing, when- 
ever the relevant random experiment is carried out. This reasoning is based 
on the underlying symmetry. Thus, one is led to stipulating that each one of 
the (simple) events {s;},7 = 1, ..., n has probability 1/n. Then the next step, 
that of defining the probability of a composite event A, is simple; if A consists 


of m sample points, A = {5;,,..., S;,,), say (1 < m < n) (or none at all, in 
which case m = 0), then the probability of A must be m/n. The notation used 
is: P({s,}) = --- = P({s,}) = + and P(A) = ™. Actually, this is the so-called 


classical definition of probability. That is, 


CLASSICAL DEFINITION OF PROBABILITY Let S be a sample space, associ- 
ated with a certain random experiment and consisting of finitely many sample 
points n, say, each of which is equally likely to occur whenever the random 
experiment is carried out. Then the probability of any event A, consisting of 
m sample points (0 < m < n), is given by P(A) = 7. 

In reference to Example 26 in Chapter 1, P(A) = - = 5 = 0.5. In Example 
27 (when the two dice are unbiased), P(X = 7) = $ = } = 0.167, where 
the r.v. X and the event (X = 7) are defined in Section 1.3. In Example 29, 
when the balls in the urn are thoroughly mixed, we may assume that all of the 
(m+n)(m+ n— 1) pairs are equally likely to be selected. Then, since the event 
A occurs in 20 different ways, P(A) = ae For m = 3 and n = 5, 
this probability is P(A) = & = Y ~ 0.357. 

From the preceding (classical) definition of probability, the following 
simple properties are immediate: For any event A, P(A) > 0; P(S) = 1; if two 
events A; and As are disjoint (A; N Az = Ø), then P(A, U Ag) = P(A;)+ P(Ag). 
This is so because, if Ay = {S;,,..., Si), Az = [S;,, .-., Syp Where all s;,,..., Sip 
are distinct from all s;,,..., Sp, then Aj U Az = {8j,..., 5j,5;,.-., Sj} and 
P(A; U Ag) = 6 = £ + £ = P(A) + P(A). 

In many cases, the stipulations made in defining the probability as above 
are not met, either because S has not finitely many points (as is the case in 
Examples 32, 33-35 (by replacing C and M by oo), and 36-40 in Chapter 1), or 


1 
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because the (finitely many outcomes) are not equally likely. This happens, for 
instance, in Example 26 when the coins are not balanced and in Example 27 
when the dice are biased. Strictly speaking, it also happens in Example 30. In 
situations like this, the way outis provided by the so-called relative frequency 
definition of probability. Specifically, suppose a random experiment is carried 
out a large number of times N, and let N(A) be the frequency of an event A, 
the number of times A occurs (out of N). Then the relative frequency of A 
is A Next, suppose that, as N — ov, the relative frequencies A oscillate 
around some number (necessarily between 0 and 1). More precisely, suppose 
that xw converges, as N > oo, to some number. Then this number is called 
the probability of A and is denoted by P(A). That is, P(A) = limy>0 A 
(It will be seen later in this book that the assumption of convergence of the 
relative frequencies N(A)/N is justified subject to some qualifications.) To 
summarize, 


RELATIVE FREQUENCY DEFINITION OF PROBABILITY Let N(A) be the num- 
ber of times an event A occurs in N repetitions of a random experiment, and 
assume that the relative frequency of A, W, converges to a limit as N > oo. 
This limit is denoted by P(A) and is called the probability of A. 

At this point, it is to be observed that empirical data show that the relative 
frequency definition of probability and the classical definition of probability 
agree in the framework in which the classical definition applies. 

From the relative frequency definition of probability and the usual proper- 
ties of limits, it is immediate that: P(A) > 0 for every event A; P(S) = 1; and 
for Aj, A> with A; N Ag = Ø, 








NA UA NA) NA 
Pau PELA. q, A y CR 
N>00 N N>00 


N N 


= lim NA) + lim 

N>00 N>00 

that is, P(A; U A2) = P(A1) + P(42), provided A; N Az = Ø. These three 

properties were also seen to be true in the classical definition of probabil- 

ity. Furthermore, it is immediate that under either definition of probability, 

P(A¡U...U Ap) = P(A) + --- + P(Az), provided the events are pairwise 
disjoint; A; N A; = 9,74 j. 

The above two definitions of probability certainly give substance to the 
concept of probability in a way consonant with our intuition about what prob- 
ability should be. However, for the purpose of cultivating the concept and 
deriving deep probabilistic results, one must define the concept of probability 
in terms of some basic properties, which would not contradict what we have 
seen so far. This line of thought leads to the so-called axiomatic definition of 
probability due to Kolmogorov. 


MA) = PAD + PAD) 


AXIOMATIC DEFINITION OF PROBABILITY Probability is a function, denoted 
by P, defined for each event of a sample space S, taking on values in the real 
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line KR, and satisfying the following three properties: 

(PD) P(A) > 0 for every event A (nonnegativity of P). 

(P2) P(S) = 1 (P is normed). 

(P3) For countably infinite many pairwise disjoint events A;, i= 1, 2,..., Ai Q 
Aj = 9,14 j, it holds 


P(A, U Ap U...) = P(A) + P(Ag) +++; or r(0a) = Y > P(A) 
i=1 i=1 


(sigma-additivity (o-additivity) of P). 


COMMENTS ON THE AXIOMATIC DEFINITION 


1) Properties (P1) and (P2) are the same as the ones we have seen earlier, 
whereas property (P3) is new. What we have seen above was its so-called 
Jfinitely-additive version; that is, P(J;_, Ai) = »;_, P(Ai), provided A; N 
Aj = 09,1 + j. It will be seen below that finite-additivity is implied by 
o-additivity but not the other way around. Thus, if we are to talk about the 
probability of the union of countably infinite many pairwise disjoint events, 
property (P3) must be stipulated. Furthermore, the need for such a union 
of events is illustrated as follows: In reference to Example 32, calculate the 
probability that the first head does not occur before the nth tossing. By 
setting A; = { TT A},i=n,n+1,..., what we are actually after here 


is P(A, U An U...) with AN A; = 2,i 4 j,iand j >n. 

2) Property (P3) is superfluous (reduced to finite-additivity) when the sample 
space S is finite, which implies that the total number of events is finite. 

3) Finite-additivity is implied by additivity for two events, P(A; U Ag) = 
P(A) + P(Ag), Ay N Az = Ø, by way of induction. 


Here are two examples in calculating probabilities. 


In reference to Example 1 in Chapter 1, take n = 58, and suppose we have the 
following configuration: 


BARIUM 
HIGH LOW 
Mercury Mercury 
Arsenic High Low High Low 
High 1 3 5 9 
Low 4 8 10 18 


Calculate the probabilities mentioned in (i) (a)—(d). 


DISCUSSION For simplicity, denote by By the event that the site selected 
has a high barium concentration, and likewise for other events figuring below. 
Then: 
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Oa) Br = (AN Br N My) U (An N By ON MU (Ae NBr N My) U (Ap N Br N Me) 
and the events on the right-hand side are pairwise disjoint. Therefore 
(by the following basic property 2 in Subsection 2.1.1): 


P(Br) = PABLO My) + P(An N Ba N My) 


+ P(A: N Brn N Mn) + Pe N Bro Me) 


1 3 4 8 16 8 
= = ==> 20.276. 
58 * 58 * 587 58 58 29 : 





A) Here P(M, N Ae N By) = P(A N BiN My) = E = GH x 0.172. 
(i)(c) Here the required probability is as in (a): 


12 
P(A, ON By NM) + P(ApNBeNMy) + P(A;NByNMy) = ÓN > = 0.207. 
(i)(d) As above, 
27 
P(A, N Be O My) + P(A, 1 By ON Mo + P(A NAB A My) = 58 ~ 0.466. 


In ranking five horses in a horse race (Example 31 in Chapter 1), calculate the 
probability that horse #3 terminates at least second. 


DISCUSSION Let A; be the event that horse #3 terminates in the ¿th posi- 


tion, i = 1,..., 5. Then the required event is A; U Az, where Aj, Az are disjoint. 
Thus, 
24 24 2 
P(A, U Ag) = P(A P(A) = — + — = - = 0.4. 
(A; U Az) (41) + P2) 0 105 0 


In tossing a coin repeatedly until H appears for the first time (Example 32 in 
Chapter 1), suppose that P{T ... T H} = P(A;) = q” * p for some 0 < p < 1 
i-1 
and q = 1 — p (in anticipation of Definition 3 in Section 2.4). Then 
go n—1 


(Da) = Soran = Soa p= pa = = p— =q". 
i=n i=n i=n =g 


i=n p 








For instance, for p = 1/2 and n = 3, this probability is z = 0.25. That is, when 
tossing a fair coin, the probability that the first head does not appear either 
the first or the second time (and therefore it appears either the third time 
or the fourth time etc.) is 0.25. For n = 10, this probability is approximately 
0.00195 ~ 0.002. 


Next, we present some basic results following immediately from the defining 
properties of the probability. First, we proceed with their listing and then with 
their justification. 
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L. 2.1.1 Some Basic Properties of a Probability Function 


1; 
. For any pairwise disjoint events Aj, ..., An, P(Uj_) A) = X; P(45). 
. For any event A, P(A°) = 1 — P(A). 

. A; C Az implies P(A) < P(A2) and P(Az — Ay) = P(A2) — P(A)). 

. 0 < P(A) < 1 for every event A. 

. (1) For any two events A; and A»: 


DONA S S) 


L. 2.1.2 Justification 
1. 


P(S)=0. 


P(A, U Ag) = P(A1) + P(42) — PLA N Az). 
(ii) For any three events A;, A2, and Ag: 
P(A, U Ap U Az) = P(A1) + P(A2) + PMs) — [PM N A2) 
+ P(A: N 43) + P(42N A3)] + P(A1 N A2 N A3). 


. For any events Aj, As, ..., PU, Ai) < NE, P(A) (o -sub-additivity), 


and P(U;_, Ai) < X}; P(A) (finite-sub-additivity). 


From the obvious fact that S = SUSUSU... and property (P3), 
P(S)= P(SUØU ØU...) = P(S)+ P(@)+ P(@) +- 


or P(@) + P()+ --- = 0. By (P1), this can only happen when P(@) = 0. 
Of course, that the impossible event has probability 0 does not come as a 
surprise. Any reasonable definition of probability should imply it. 


. Take A; = Ø for i > n+ 1, consider the following obvious relation, and use 


(P3) and #1 to obtain: 


(Üa) = (Ya) = 2,PC40 = 2 PAD 


. From (P2) and #2, P(AU A’) = P(S) = 1 or P(A) + P(A°) = 1, so that 


P(A’) = 1 — P(A). 


. The relation A; C As, clearly, implies A = A, U (Ag — Aj), so that, by #2, 


P(A>) = P(A1) + P(42 — Aj). Solving for P(A — Aj), we obtain P(4> — 
A1) = P(42) — P(A1), so that, by (P1), P(A1) < P(4»). 

At this point it must be pointed out that P(4> — A) need not be P(4>) — 
P(Ay), if Ay is not contained in Ag. 


. Clearly, Y C AC S for any event A. Then (P1), #1 and #4 give: 0 = P(@) < 


P(A) < P(S) = 1. 


. (1) It is clear (by means of a Venn diagram, for example) that 


Ay U Ag = Ay U (42 N A‘) = A, U (Az — Ar N As). 
Then, by means of #2 and #4: 
P(A, U Az) = P(A1) + P@M — Ar N Az) = P(A1) + P(A2) — P(A: N Az). 
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(ii) Apply part (i) to obtain: 
P(A, U A2 U Az) = P[(Ay U Ag) U Az] = P(A, U Az) + P(A) 

— P[(A1 U A2) N As] 

= P(A) + P(4A2) — PAL N A2) + PMs) 
— P[(A1 N Az) U (42 N 43)] 

= P(A) + Pe) + P(A3) — P(A: N A2) 
— [P(A1 N As) + P(A2 N Ag) — P(A1 N A2 N Ag)] 

= P(A) + P(A2) + P(A3) — P(A; N A2) — P(A1 N Az) 
— P(43N A3) + P(A, N A2 N As). 

7. By the identity in Section 2 of Chapter 1 and (P3): 


CO 
»(Ua) = PIA, U (ASN Ae) U...U (ASN... AS NM An) U...] 
i=1 
= P(AD+P(A¡NA2) +--+ + P(AU DN... Apa1 N An) +++ 


< P(A1) + P(A2) +--+» + P(An) + - ++ (by #4). 
For the finite case: 


n 
(Ya) =P[4/U(ANA>z)U...U(AÍN...NAL_,NA,)] 
= P(A) + P(A N Ag) +--+ P(AL IN... A2 1 N An) 
< P(A1) + PA) + ++ + PAn). 


Next, some examples are presented to illustrate some of the properties 
#1-#7. 


(i) For two events A and B, suppose that P(A) = 0.3, P(B) = 0.5, and P(A U 
B) = 0.6. Calculate P(A N B). 

(ii) If P(A) = 0.6, P(B) = 0.3, P(A N BS) = 0.4, and B C C, calculate P(A U 
B"UC. 


DISCUSSION 


G) From P(AU B) = P(A) + P(B) — P(AN B), we get P(AN B) = P(A) + 
P(B) — P(AU B) = 0.3 + 0.5 — 0.6 = 0.2. 

(ii) The relation B C C implies C° c B° and hence AU B° UCS = AU B®. 
Then P(A U B°UC*) = P(AU BS = P(A) + P(B®) — P(AN B9) = 
0.6 + (1-0.3)—0.4=0.9. 


Let A and B be the respective events that two contracts I and II, say, are 
completed by certain deadlines, and suppose that: P(at least one contract 
is completed by its deadline) = 0.9 and P(both contracts are completed by 
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their deadlines) = 0.5. Calculate the probability: P(exactly one contract is 
completed by its deadline). 


DISCUSSION  Theassumptions made are translated as follows: P(A U B) = 
0.9 and P(AN B) = 0.5. What we wish to calculate is: P(AN B°)U(A°N B)) = 
P(AN B®) + P(A‘ N B). Clearly, A = (AN B)U (AN BS and B = (AN B)U 
(A* N B), so that P(A) = P(AN B) + P(AN BS and P(B) = P(AN B) + 
P(A° N B). Hence, P(AN BS) = P(A) — P(AN B) and P(A" N B) = P(B) — 
P(A N B). Then PAN BO) + P(4 N B) = P(A) + P(B) — 2(AN B) = 
[P(A)+ P(B)— P(AN B)]— P(AN B) = P(AUB)— P(AN B) = 0.9— 0.5 = 0.4. 


(i) For three events A, B, and C, suppose that P(A N B) = P(AN C) and 
P(BNC) = 0. Then show that P(A U BUC) = P(A) + P(B) + P(C) — 
2P(AN B). 

(ii) For any two events A and B, show that P(A*N B®) = 1 — P(A) — P(B) + 
P(AN B). 


DISCUSSION 


(i) We have P(AU BUC) = P(A) + P(B)+ P(C) — P(ANB)-P(ANC)— 
P(BNC)+ P(AN BNC). But AN BNC c BNC, so that PCAN BNC) < 
P(B A C) = 0, and therefore P(A U BUC) = P(A) + P(B) + P(C) — 
2P(AN B). 

(ii) Indeed, P(A°N B®) = P((AU B)*) = 1 — P(AU B) = 1 — P(A) — P(B) + 
P(AN B). 


In ranking five horses in a horse race (Example 31 in Chapter 1), what is the 
probability that horse #3 will terminate either first or second or third? 


DISCUSSION Denote by B the required event and let A; = “horse #3 
terminates in the ith place,” i = 1, 2, 3. Then the events A;, A2, Az are pairwise 
disjoint, and therefore 


P(B) = P(A, U Ag U Ag) = P(41) + P(42) + P(43). 





But P(A1) = P(42) = P(43) = — = 0.2, so that P(B) = 0.6. 

Consider a well-shuffled deck of 52 cards (Example 28 in Chapter 1), and 
suppose we draw at random three cards. What is the probability that at least 
one is an ace? 


DISCUSSION Let A be the required event, and let A; be defined by: A; = 
“exactly i cards are aces,” i = 0, 1, 2, 3. Then, clearly, P(A) = P(A, U A U 
As). Instead, we may choose to calculate P(A) through P(A%) = 1 — P(Ap), 
where 


48 
8) 48x47x46 4,324 1,201 
P(Ao) = is) SS so that P(A) = 2 


= = = 0.217. 
(>) 52 x5lx50 5525 5,525 








THEOREM 1 


THEOREM 2 
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Refer to Example 3 in Chapter 1 and let C1, C2, C3 be defined by: Cı = “both 
Si and S2 work,” C2 = “S5 works,” C3 = “both S3 and S4 work,” and let 
C = “current is transferred from point A to point B.” Then P(C) = P(C U 
Cə U C3). At this point (in anticipation of Definition 3 in Section 2.4; see also 
Exercise 4.14 in this chapter), suppose that: 


P(C1) = Pipe, P(C2)= ps, P(C3)= psPa, 

P(C1 N C2) = pip2P5, P(C1 N C3) = pipepsMa, 

P(C2 N C3) = papa ps, P(C1 N C2N C3) = pı p2 ps paps. 
Then: 


P(C) = pı p2 + ps + P3P4 — Pı P2 Ps — P1P2P3Pa4 — P3 P4 Ps + P1P2P3P4D5. 
For example, for pı = p2 = ps = 0.9, we obtain 
P(C) = 0.9 + 2(0.9? — 2(0.9) — (0.9)* + (0.9) ~ 0.996. 


This section is concluded with two very useful results stated as theorems. 
The first is a generalization of property #6 to more than three events, and the 
second is akin to the concept of continuity of a function as it applies to a 
probability function. 





The probability of the union of any n events, Aj, ..., An, is given by: 
n n 
o(U 4) = PO) Y PLA OA) 
j=1 j=1 l<j<ja<n 


+ Y PARNARN AR- 


l<j<j2<j3sn 


PON 








Although its proof (which is by induction) will not be presented, the pattern 
of the right-hand side above follows that of property #6(i) and it is clear. First, 
sum up the probabilities of the individual events, then subtract the probabilities 
of the intersections of the events, taken two at a time (in the ascending order 
of indices), then add the probabilities of the intersections of the events, taken 
three at a time as before, and continue like this until you add or subtract 
(depending on n) the probability of the intersection of all n events. 


Recall that, if Ay C Ao C..., then limy.. An = UE An, andif A; > Az > 
pong Men lim. An = li An- 





For any monotone sequence of events (4,),n > 1, it holds P(lim,_>0 
A (LA) 
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This theorem will be employed in many instances, and its use will be then 
pointed out. 





1.1 If P(A) = 0.4, P(B) = 0.6, and P(A U B) = 0.7, calculate P(A N B). 
1.2 If for two events A and B, it so happens that P(A) = 3 and P(B) = 3, 





show that: 
3 1 3 
PAUB >= and -< P(ANB)< =. 
4 8 8 
1.3 Ifforthe events A, B, and C, itso happens that P(A) = P(B) = P(C) = 1, 
then show that: 


P(AN B) = P(ANC) = P(BNC) = P(ANBNC)=1. 


1.4 If the events A, B, and C are related as follows: A C B Cc C and P(A) = 
1, P(B) = $, and P(C) = 5, compute the probabilities of the following 
events: 


AMB, ANC, Bend, AN BNC, ASO BENC. 
1.5 Let S be the set of all outcomes when flipping a fair coin four times, so 
that all 16 outcomes are equally likely. Define the events A and B by: 
A = {s € S; s contains more Ts than Hs}, 
B = {s € S; any T ins precedes every H in s}. 
Compute the probabilities P(A), P(B). 


1.6 Let S = {x integer; 1 < x < 200}, and define the events A, B, and C as 
follows: 


A = {x € S; xis divisible by 7}, 
B = {x € S; x = 3n + 10, for some positive integer n}, 
C = {x € S; x? + 1 < 375}. 

Calculate the probabilities P(A), P(B), and P(C). 


1.7 If two fair dice are rolled once, what is the probability that the total 
number of spots shown is: 
(1) Equal to 5? 
Gi) Divisible by 3? 


1.8 Students in a certain college subscribe to three news magazines A, B, 
and C according to the following proportions: 
A: 20%, B: 15%, C : 10%, 


both A and B: 5%, both A and C: 4%, both B and C : 3%, all three A, B, 
and C : 2%. 
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Tf a studentis chosen at random, what is the probability he/she subscribes 
to none of the news magazines? 


1.9 A high school senior applies for admissions to two colleges A and B, 
and suppose that: P(admitted at A) = pı, P(rejected by B) = p2, and 
P(rejected by at least one, A or B) = pz. 

(i) Calculate the probability that the student is admitted by at least one 
college. 

(ii) Find the numerical value of the probability in part (i), if pı = 0.6, p2 = 
0.2, and p3 = 0.3. 


1.10 An airport limousine service has two vans, the smaller of which can carry 
6 passengers and the larger 9 passengers. Let x and y be the respective 
numbers of passengers carried by the smaller and the larger van in a given 
trip, so that a suitable sample space S is given by: 


S={w%y);v=0,...,6 and y=0,1,..., 9}. 


Also, suppose that, for all values of x and y, the probabilities P(((x, y)}) 
are equal. Finally, define the events A, B, and C as follows: 


A = “the two vans together carry either 4 or 6 or 10 passengers,” 
B = “the larger van carries twice as many passengers as the smaller 


van, 
C = “the two vans carry different numbers of passengers.” 


Calculate the probabilities: P(A), P(B), and P(C). 


1.11 In the sample space S = (0, co), consider the events A, = (0,1 — 2), 
n=1,2,..., A=(0, 1), and suppose that P(A,) = 4. 
(i) Show that the sequence {A,} is increasing and that lim, An = 
Urea An = A. 
(ii) Use part (i) and the appropriate theorem (cite it!) in order to calculate 
the probability P(A). 
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For a r.v. X, define the set function Py(B) = P(X e B). Then Py is a prob- 
ability function because: Py(B) > 0 for all B, Py(t) = P(X € K) = 1, and, if 
B;, j = 1,2, ... are pairwise disjoint then, clearly, (X e B;), j > 1, are also 
pairwise disjoint and X e (Uj, Bj) = Uji e Bj). Therefore 


r(s)=#[<(Cs)] 4] Gere] 


= Y P(X e By)= Y Px(By). 
j=l j 


j=l 
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Figure 2.1 


Examples of Graphs of 


d.f.’s 
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The probability function Py is called the probability distribution of the r.v. 
X. Its significance is extremely important because it tells us the probability that 
X takes values in any given set B. Indeed, much of probability and statistics 
revolves around the distribution of r.v.'s in which we have an interest. 

By selecting B to be (—oo, x], x € R, we have Px(B) = P(X e (—oo, x]) = 
P(X < x). In effect, we define a point function which we denote by Fx; that is, 
Fx(x) = P(X < x), x € R. The function Fy is called the distribution function 
(d.f.) of X. Clearly, if we know Py, then we certainly know Fx. Somewhat un- 
expectedly, the converse is also true. Namely, if we know the (relatively “few”) 
probabilities Fx(x), x e R, then we can determine precisely all probabilities 
Px(B) for B subset of R. This converse is a deep theorem in probability that 
we cannot deal with here. It is, nevertheless, the reason for which it is the d.f. 
Fx we deal with, a familiar point function for which so many calculus results 
hold, rather than the unfamiliar set function Py. 

Clearly, the expressions Fx(+00) and Fx(—00) have no meaning because 
+00 and —oo are not real numbers. They are defined as follows: 


Fx(+00)= lim Fx(%n), Xn $ oo and Fx(=00)= lim Fx(Yn), Yn 4 —00. 
These limits exist because x < yimplies (—co, x] C (—09, y] and hence 
Px(Goo, x]) = Fx(x) < Fx(y) = Px((—00, yl). 
The d.f. of a r.v. X has the following basic properties: 


1. 0 < Fx(x) < 1 for all x e NR; 
2. Fx is a nondecreasing function; 
3. Fy is continuous from the right; 
4. Fx(+00)= 1, Fx(-00) = 0. 


The first and the second properties are immediate from the definition of the 
d.f.; the third follows by Theorem 2, by taking x, | x; so does the fourth, by 
taking £n $ +00, which implies (—oo, Xa] $ R, and Yn | —00, which implies 
oo, Yn] 4 Ø. Figures 2.1 and 2.2 show the graphs of the d.f.'s of some typical 
cases. 

Now, suppose that the r.v. X is discrete and takes on the values xj, j = 
1, 2, ..., n. Take b = (x;) and on the set {%, v2, ..., £n} define the function fx 
as follows: fx(x;) = Px((xj)). Next, extend fx over the entire KR by setting 











(a) Binomial for n = 6, p = : ó (b) Poisson for A = 2. 
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Figure 2.2 





Examples of Graphs of 
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(d) NO, 1). 














Sx(x) = 0 for x A xj, j = 1,2,...,n. Then fy(x) > 0 for all x, and it is 
clear that P(X e B) = De BJx(x;) for B C N. In particular, v= Sx(aj) = 
ae en Jx) = P(X e R) = 1. The function fx just defined is called the 
probability density function (p.d.f.) of the rv. X. By selecting B = (—oo, x] 
for some x € R, we have Fx(x) = <P Sx(x;). Furthermore, if we assume at 
this point that xı < Xə < --- < Xp, it is clear that 


Sx(%j) = Fx(aj) — Fy(vj-1), J = 2,3,...,n and fx(%) = Fx); 


we may also allow j to take the value 1 above by setting F'y(v) = 0. Likewise 
if X takes the values x;, 7 = 1,2,... These two relations state that, in the 
case that X is a discrete r.v. as above, either one of the Fx of fx specifies 
uniquely the other. Setting Fx(xj—) for the limit from the left (left-limit) of 
Fy at xj, Fx(x;-) = lim Fx(x) as x + xj, we see that Fy(x;) — Fx(x;-1) = 
Fx(x;) — Fx(a;—), so that fx(x;) = Fx(x;) — Fx(x¡-). In other words, the 
value of fx at x; is the size of the jump of Fy at the point xj. These points 
are illustrated quite clearly in Figure 2.3. For a numerical example (associated 
with Figure 2.3), let the rv. X take on the values: —14, —6, 5, 9, and 24 with 
respective probabilities: 0.17, 0.28, 0.22, 0.22, and 0.11. 


Figure 2.3 
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Several specific cases of discrete r.v.'s are discussed in Section 3 of the 
following chapter. Also, Examples 4-12, 18-20, 26, 27, and 32-35 in Chapter 1 
lead to discrete r.v.'s. 

Now, suppose that X is a continuous r.v., one which takes on all values in 
a proper interval J (finite or not) in the real line %, so that J = (a, b) with 
—oo <a < b < oo. Suppose further that there exists a function f : I —> [0, co) 
having the following property: Fx(x) = f FO) dt, x € I. In particular, 


b 
/ f(i)dt = Fy(b) = P(X <b) = P(a < X < b). 


If I is not all of N, extend f off J by setting f(x) = 0 for x ¢ I. Thus, for 
all x: f(x) > 0 and Fx(x) = io S(t) dt. As has already been pointed out 
elsewhere, Fx uniquely determines Py. The implication of it is that P(X € 
B) = Px(B) = fa f() dt, B C Ñ, and, in particular, 


[roa [" soa=raem=1. 
R —oo 


The function f with the properties: f(a) > 0 all x and P(X € B) = f ¿O a, 
BCU, is the p.d.f. of the r.v. X. In order to emphasize its association with the 
r.v. X, we often write fx. 

Most of the continuous r.v.’s we are dealing with in this book do have p.d.f.'s. 
In Section 3 of the following chapter, a number of such r.v.'s will be presented 
explicitly. 

Also, Examples 13-17, 21-25, and 36-40 in Chapter 1, under reasonable 
assumptions, lead to continuous r.v.'s, as will be seen on various occasions 
later. Continuous r.v.'s having p.d.f.'s, actually, form asubclass of all continuous 
r.v.’s and are referred to as absolutely continuous r.v.'s. In this book, the term 
continuous r.v. will be used in the sense of an absolutely continuous r.v. 

It is to be observed that for a continuous rv. X, P(X = x) = 0 for all 
x € R. That is, the probability that X takes on any specific value x is 0; X 
takes on values with positive probabilities in a nondegenerate interval around 
x. That P(X = x) = 0 follows, of course, from the definition of the p.d.f. of a 
continuous r.v., as 


pa=09=/ sOd=0 


For a case of a genuine (absolutely) continuous r.v., refer to Example 37 
in Chapter 1 and let X and Y be r.v.'s denoting the cartesian coordinates of 
the point P of impact. Then the distance of P from the origin is the r.v. 
R= yX? + Y?, which truly takes every value in [0, oo). As will be seen, it 
is reasonable to assume that X and Y are independently normally distributed 
with mean 0 and variance o. This leads to the fact that R? is a multiple of a 
chi-square distributed r.v., so that the p.d.f. of Ris precisely determined. (See 
Exercise 2.14 in Chapter 5.) 

If X is a continuous r.v. with p.d.f. fx, then its d.f. Fy is given by Fx(x) = 
je fi dt, x e N, so that fx uniquely determines Fy. It is also true that 


oF) = fx(x) (for continuity points x of fx). Thus, Fy also determines fy. 
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In summary then, with a r.v. X, we have associated three quantities: 


(i) The probability distribution of X, denoted by Px, which is a set function 
and gives the probabilities P(X € B), BCH. 

(ii) The d.f. of X, denoted by Fx, which is a point function and gives the 
probabilities P(X e (—oo, x]) = P(X < x) = Fx(x), x e K. 

Gii) The p.d.f. fx which is a nonnegative (point) function and gives all prob- 
abilities we may be interested in, either through a summation (for the 
discrete case) or through an integration (for the continuous case). Thus, 
for every x € R: 


Y fx(x;) forthe discrete case 
Fr(@) = PX < a) = | 232 
f ae Jfx(t)dt for the continuous case, 


or, more generally: 


Y fx(xi) for the discrete case 
Px(B) = P(X € B) = 4 78 
fg fx(@)dt for the continuous case. 





In the discrete case, fx(x;) = P(X = xı) = Px((x;)), whereas in the 
continuous case, fx(x) = 0 for every x. The p.d.f. fx, clearly, determines the 
d.f. Fy, and the converse is also true. Of course, the p.d.f. fx also determines 
the probability distribution Py, but what is also true, although not obvious, is 
that the d.f. Fy determines the probability distribution Py. 

Given ar.v. X, we are primarily if not exclusively interested in its probability 
distribution Py. Because of the above, it suffices to restrict ourselves either to 
the d.f. Fy, or even better, to the p.d.f. fx, which is easier to work with. 

The notation X ~ Fy or X ~ fy stands for the statement that the r.v. X has 
the d.f. Fy or p.d.f. fx, respectively. 

This section is concluded with the following observation and some follow- 
up discussion. If Q is any probability distribution in KR, then there is a rv. X 
such that Py = Q. To see this, let Y be a r.v. with p.d.f. fy(y) = 1 in [0, 1] (and 
0 outside this interval). Then its d.f. Fy is given by: 


Fy(y)=0, y<0; FA YD=Y, 0<y<l; Fy=1, y>l. 


Next, let F be the d.f. determined by Q, and suppose it is strictly increasing, 
so that the inverse F~! exists. Set X = F~!(Y). Then we assert that Fy = F. 
Indeed, 


Fx(a) = P(X < x) = PIF UV) < x] = P{F[F DM] < Fœ) 
= PIY < F(@)] = F), 


because 0 < F(x) < 1. That is, Fx(x) = F(x), x e R. The same result follows 
even if F is not strictly increasing by a modified definition of the inverse F~!. 

Along the same line, it makes sense to ask whether a given function f is the 
p.d.f. or ar.v. X. The required conditions for this to be the case are: f(x) > 0 
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for all x, and either 
S(aj)>0, 7 =1,2,..., with)» f(x,)=1, and f(x)=0 foral x # xj, j>1; 
j 


or F S (a) dx = 1. 


This is so because in either case f defines a d.f. F and hence a r.v. X (discrete 
in the former case, and continuous in the latter case) with d.f. F. 
Let us conclude this section with the following concrete examples. 


The number of light switch turn-ons at which the first failure occurs is a r.v. X 
whose p.d.f. is given by: f(x) = ay, x=1,2,... (and 0 otherwise). 
(1) Determine the constant c. 
(ii) Calculate the probability that the first failure will not occur until after the 
10th turn-on. 
(iii) Determine the corresponding d.f. F. 


Hint: At this point, recall how we sum geometric series; namely, 752, t” 
tk 
gp (El <1,=0,1,... 


DISCUSSION 
(i) The constant c is determined through the relationship: XX; f(x) = 
1 or N2 CJ) = 1. However, De = ON = 


d+ (DRY =c = 10c, so that c = $ 


(ii) Here P(X > 10) = P(X > WW =c 2 Bt = el)? + GD 
a= ur =0c-10(23% = 2. 108)" = (0.9) ~ 0.349. 
(iii) First, for x < 1,F(x) = 0. Next, for x > 1,F@) = Da)! = 
ERE] 
t=) 4G) Eai = 1 me T Go” 


Thus, F(x) = 0 for x < 1, and F(x) = 1 — (3) for x > 1. 





+ 





The recorded temperature in an engine is a r.v. X whose p.d.f. is given by: 
f(a) = nl — xy"1,0 < x < 1 (and 0 otherwise), where n > 1 is a known 
integer. 

(i) Show that f is, indeed, a p.d.f. 

(ii) Determine the corresponding d.f. F. 


DISCUSSION 


(i) Because f(x) > Oforall x, we simply have to check that f, T S (a) dx = 1. To 
this end, {> fdr = h nd -A da = nE |} =-—(1— ay) =1. 

(ii) First, F(x) = Oforx < 0, whereas for0 < x < 1, F(x) = fj nA—0' tdt = 
—(1— t)"|ọ (from part (i)), and this is equal to: -1 -4+1 = 1- 1- x)". 
Thus, 





0, x<0 
Fo)=41-(1-%” O<x<l 
1, x>0. 
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2.1 A sample space describing a three-children family is as follows: S = 
{bbb, bbg, bgb, gbb, bgg, gbg, ggb, ggg}, and assume that all eight out- 
comes are equally likely to occur. Next, let X be the r.v. denoting the 
number of girls in such a family. Then: 

(i) Determine the set of all possible values of X. 
(ii) Determine the p.d.f. of X. 
(iii) Calculate the probabilities: P(X > 2), P(X < 2). 


2.2 Arv. X has d.f. F given by: 


0, <0 
F(a) = } 2c(x? — ix’), 0<x<2 
l, > 2. 


(i) Determine the corresponding p.d.f. f. 
(ii) Determine the constant c. 


2.3 The r.v. X has d.f. F given by: 


0, v=0 
Fo=jix-xX%2+x, 0O<xr<l 
1 > 1. 


? 


(1) Determine the corresponding p.d.f. f. 
(ii) Calculate the probability P(X > 4). 


2.4 The r.v. X has d.f. F given by: 


0, x<d 
0.1, 4<xw<5b 
0.4, 5<x<6 
F = > = 
= lon syes 
0.9, 8<x<9 
1, x=. 


(i) Draw the graph of F. 
(ii) Calculate the probabilities 


P(X <65), P(X>8.D, P5 < x< 8). 


2.5 Let X be a r.v. with p.d.f. f(x) = cx +D, for x > 1, where c is a positive 
constant. 
(1) Determine the constant c, so that f is, indeed, a p.d.f. 
(ii) Determine the corresponding d.f. F. 


2.6 Let X be a r.v. with p.d.f. f(x) = cx + d, for 0 < x < 1, and suppose that 
P(X > $) = §. Then: 
(i) Determine the constants c and d. 
(ii) Find the d.f. F of X. 
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2.7 Show that the function f(x) = GF, x= 1, 2, ...is a p.d.f. 


2.8 For what value of c is the function f(x) = ca, x = 0,1,... a p.d.f.? 
The quantity a is a number such that 0 < œ < 1. 


2.9 For what value of the positive constant c is the function f(x) = c”, x = 
1,2,...ap.d.f.? 


2.10 The p.d.f. of a r.v. X is f(x) = cG), for x = 0, 1, ..., where cis a positive 
constant. 
G) Determine the value of c. 
Gi) Calculate the probability P(X > 3). 


2.11 The r.v. X has p.d.f. f given by: f(x) = c(1 — x7), -1 <x <1. 
G) Determine the constant c. 
Gi) Calculate the probability P(—0.9 < X < 0.9). 


2.12 Let X be a r.v. denoting the lifetime of an electrical equipment, and sup- 
pose that the p.d.f. of X is: f(x) = ce~™, for x > 0 (for some constant 
c > 0). 
(i) Determine the constant c. 
(ii) Calculate the probability that X is at least equal to 10 (time units). 
(iii) If the probability in part (ii) is 0.5, what is the value of c? 


2.13 The r.v. X has the so-called Pareto p.d.f. given by: f(x) = — for x > 1, 
where a is a positive constant. 
(i) Verify that f is, indeed, a p.d.f. 
(ii) Calculate the probability P(X > c), for some c > 1. 


2.14 Suppose that the rv. X takes on the values 0, 1, ... with the respective 
probabilities P(X = j) = f(J) = 33,7 =0,1,.... Then: 
(i) Determine the constant c. 
Compute the probabilities: 
(ii) P(X > 3). 
(iii) P(X =2k +1, k=0,1,...). 
(iv) P(X = 3k +1, k=0,1,...). 





2.15 Let X be a r.v. with p.d.f. f whose graph is given below. 
Without calculating f and by using geometric arguments, compute the 
following probabilities: 


P(X <3) Pl<sX<D Psa, POS 
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2.16 Let X be the r.v. denoting the number of a certain item sold by a merchant 
in a given day, and suppose that its p.d.f. is given by: 


1 a+l 
sco=(3) ei 


Calculate the following probabilities: 
(i) No items are sold. 

(ii) More than three items are sold. 

(iii) An odd number of items is sold. 


2.17 Suppose a r.v. X has p.d.f. given by: f(x) = 1e*,x > 0, (A > 0), and 
you are invited to bet whether the observed value x of X would be >c 
or <c for some positive constant c. 
(i) For what c would you bet in favor of x > c? 
(ii) What is the answer in part (i) if 1 = 4log2? (log is the natural 
logarithm.) 


2.18 The lifetime in hours of electric tubes is ar.v. X with p.d.f. f(x) = c?xe~, 
for x > 0, where cis a positive constant. 
(i) Determine the constant c for which f is, indeed, a p.d.f. 
(ii) Calculate the probability that the lifetime will be at least t hours. 
(iii) Find the numerical value in part (ii) for c = 0.2 and t = 10. 


2.19 Let X be the r.v. denoting the number of forms required to be filled out 
by a contractor for participation in contract bids, where the values of 
X are 1, 2, 3, 4, and 5, and suppose that the respective probabilities are 
proportional to x; that is, P(X = x) = f(x) = cx, x = 1,..., 5. 
G) Determine the constant c. 
Gi) Calculate the probabilities: 





P(X <3) P2<X<4). 


2.20 The recorded temperature in an engine is a r.v. X whose p.d.f. is given 
by: fF(&@) = n(1 — 31,0 < x < 1 (n > 1, known integer). The engine 
is equipped with a thermostat which is activated when the temperature 
exceeds a specified level xo. If the probability of the thermostat being 
activated is 1/10°”, determine xo. 


| 2.3 Conditional Probability and Related Results 


Conditional probability is a probability in its own right, as will be seen, and it 
is an extremely useful tool in calculating probabilities. Essentially, it amounts 
to suitably modifying a sample space S, associated with a random experiment, 
on the evidence that a certain event has occurred. Consider the following 
examples, by way of motivation, before a formal definition is given. 
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In tossing three distinct coins once (Example 26 in Chapter 1), consider the 
events B = “exactly 2 heads occur” = (4 HT, HTH, THH}, A = “2 To 
coins (e.g., coins $ and #2) show heads” = (HH H, HHT}. Then P(B) = 

and P(A) = 3- Now, suppose we are told that event B has occurred añd 
we are nated to acevaliate the probability of A on the basis of this evidence. 
Clearly, what really matters here is the event B, and, given that B has occurred, 
the event A occurs only if the sample point HHT appeared; that is, the event 
{HHT} = AN B occurred. The required probability is then z = ye =? aap 
and the notation employed is P(A | B) (probability of A, given that B has 
oe or, just, given B). Thus, P(A | B) = 7428. Observe that P(A | B) = 


P(B) 
t > |= P(A). 








In rolling two distinct dice once (Example 27 in Chapter 1), consider the event 
B defined by: B = “the sum of numbers on the upper face is 5”, so that B = 
(4, 1), (1, 2), A, 3), Q, 4), @, D, @, 2), Q, 3), G, D, G, 2), 4, D}, and let A = 
“the sum of numbers on the upper facesis > 4.” Then A° = “the sum ol numbers 
on the upper faces is <3” = tu, Ds 4, 2), (2, D}, so that P(B) = = = =% and 
P(A) = 1 — P(A) = 1 = = z = - . Next, if we are told “that B has 
occurred, then the only way that A ey is if AN B occurs, where AN B= 
“the sum of numbers on the upper faces is both >4 and <5 (i.e., either 4 
or 5)” = {(1, 3), C1, 4), (2, 2), (2, 3), (8, D, ©, 2), A Dy Thus, P(A| B) = + = 


1/36 _ P(ANB 
3 = + and observe that P(A| B) = $ < 1 = P(A). 








In recording the gender of children in a two-children family (Example 30 in 
Chapter 1), let B = {bg, gb} and let A = “older child is a boy” = {bb, bg}, so 
that AN B = {bg}. Then P(B) = 3 = P(A), P(A| B) = 3. 


These examples motivate the following definition of conditional prob- 
ability. 


DEFINITION 1 

The conditional probability of an event A, given the event B with 
P(B) > 0,is denoted by P(A | B) andis defined by: P(A | B) = P(AN B)/ 
P(B). 


Replacing B by the entire sample space S, we are led back to the (uncondi- 
tional) probability of A, as © = LE an = P(A). Thus, the conditional prob- 
ability is a generalization of the concept of probability where S is restricted to 
an event B. 

That the conditional probability is, indeed, a probability is seen formally 
as follows: P(A| B) > 0 for every A by definition; 


_ P(SOB) P(B) 
AA Ry 
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and if Aj, Ag, ... are pairwise disjoint, then 





Ñ PU(UR1 4) OB)  PIUZCA¡n B)] 
el a! s) ~ P(B) = P(B) 


o UPN B) 2 P(AnB) 


= = P(A;| B). 
PUB) 2 PB) 2 (451 B) 


It is to be noticed, furthermore, that the P(A | B) can be smaller or larger 
than the P(A), or equal to the P(A). The case that P(A | B) = P(A)is of special 
interest and will be discussed more extensively in the next section. This point 
is made by Examples 12, 13, and 14. 

Here are another three examples pertaining to conditional probabilities. 


When we are recording the number of particles emitted by a certain radioactive 
source within a specified period of time (Example 35 in Chapter 1), we are going 
to see that, if X is the number of particles emitted, then X is a r.v. taking on 
the values 0,1,... and that a suitable p.d.f. for it is fx(v) = e~* x = 0; ld... 
for some constant A > 0. Next, let B and A be the events defined by: B = 
(X > 10), A= (X < 11), sothat AN B= (10 < X < 11) = (X = 10or X = 11). 
Then 


ya = A? 
P(B)= 3 A a 
x=10 x=10 ~~" 


P(A) = poz = P2 and 


= A 
PaB) = (e or et) loys a 


Once again, P(A | B) = 2 aie For a numerical example, take à = 10. Then 


we have (by means of Poisson tables): 


P(B) ~ 0.5421, P(A) ~ 0.6968, and P(A|B) ~ 0.441. 


When recording the lifetime of an electronic device, or of an electrical appli- 
ance etc. (Example 36 in Chapter 1), if X is the lifetime under consideration, 
then X is a r.v. taking values in [0, oo), and a suitable p.d.f. for it is seen to be 
the function fx(w) = 1e7%, x > 0, for some constant à > 0. Let B and A be 
the events: B = “at the end of 5 time units, the equipment was still operat- 
ing” = (X > 5), A = “the equipment lasts for no more than 2 additional time 
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units” = (X < 7). Then AN B= (5 < X < 7), and: 


00 7 
P(B) = / se" da=e™, P(A)= | re" dx =1-—e, 
5 0 


7 
P(AN B) = / re” dæ = e e so that 


P(ANB) e*-e” a 


P(A|B)= [oe =l-e 





Take, for instance, à = b Then, given that e”! ~ 0.36788, the preceding 
probabilities are: 


P(B) ~ 0.607, P(A)~0.503 and P(A|B)=0.181. 


If for the events A and B, P(A)P(B) > 0, then show that: P(A | B) > P(A) 
if and only if P(B| A) > P(B). Likewise, P(A| B) < P(A) if and only if 
P(B| A) < P(B). 


DISCUSSION Indeed, P(A | B) > P(A) is equivalent to tae > P(A) or 


te > P(B) or P(B|A) > P(B). Likewise, P(A| B) < P(A) is equivalent 
to ea < P(A) or “65> < PCB) or P(B| A) < PCB). 

This section is concluded with three simple but very useful results. They 
are the so-called multiplicative theorem, the total probability theorem, and the 
Bayes formula. 














(Multiplicative Theorem) For any nevents A, ..., A, with ACS Aj) > 
0, it holds: 


(Aa) AA ANA AA AA) 
j=l 
; OPA ANP 











Its justification is simple, is done by induction, and is left as an exercise 
(see Exercise 3.8). Its significance is that we can calculate the probability of 
the intersection of n events, step by step, by means of conditional probabilities. 
The calculation of these conditional probabilities is far easier. Here is a simple 
example which amply illustrates the point. 


An urn contains 10 identical balls of which 5 are black, 3 are red, and 2 are 
white. Four balls are drawn one at a time and without replacement. Find the 
probability that the first ball is black, the second red, the third white, and the 
fourth black. 
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DISCUSSION Denoting by B, the event that the first ball is black, and 
likewise for Ra, W3, and By, the required probability is: 


P(B; A R2 N W3 N By) = P(B4| B1 A R2 O W3).P(W3| Bi N R2)PCRe | B)P(B). 


Assuming equally likely outcomes at each step, we have: 


5 3 2 
PB) = iy P(R21B)= p» P(Ws|B1N Ra) = z, 


4 
PCBs | Bi N R2 N Ws) = 2. 


Therefore, 


P(B A Ra N W3 N By) = k x : x : x > = > = 0.024. 

For the formulation of the next result, the concept of a partition of S is 
required. The events (4;, A2, ..., An) form a partition of S, if these events are 
pairwise disjoint, A; N A; = 9,14 j, and their union is S, Uj- A; = S. Then 
it is obvious that any event B in S may be expressed as follows, in terms of a 
partition of S; namely, B = Uj- j N B). Furthermore, 





P(B) = Y P(4;N B)= Y P(B| A)P(A;), provided P(A;) > 0 for all j. 
j=1 


j=l 
The concept of partition is defined similarly for countably infinite many events, 
and the probability P(B) is expressed likewise. In the sequel, by writing 7 = 
1,2,... and >)”, we mean to include both cases, finitely many indices and 
countably infinite many indices. 

Thus, we have the following result. 





(Total Probability Theorem) Let {A;, Ag, ...} be a partition of S, and let 
P(A;) > 0 for all j. Then, for any event B, 


P(B) = YX P(B| ANP(AS). 
J 








The significance of this result is that, if it happens that we know the prob- 
abilities of the partitioning events P(A;), as well as the conditional prob- 
abilities of B, given Aj, then these quantities may be combined, according 
to the preceding formula, to produce the probability P(B). The probabilities 
P(A;), j = 1, 2, ... are referred to as a priori or prior probabilities. The fol- 
lowing examples illustrate the theorem and also demonstrate its usefulness. 


In reference to Example 2 in Chapter 1, calculate the probability P(+). 


DISCUSSION Without having to refer specifically to a sample space, it is 
clear that the events D and N form a partition. Then, 


P(+) = P(+ and D) + P(+ and N) = PŒ | D)P(D) + P | N)P(N). 
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Here the a priori probabilities are P(D) = pı, P(N) = 1 — pı, and 
PE+|D)=1-PE|ID)=1-Pp3, PG|N)= pz. 
Therefore, P(+) = (1 — p3)p1 + p2(1 — pı). For a numerical application, take 


pı = 0.02 and p2 = p3 = 0.01. Then P(+) = 0.0296. So, on the basis of this 
testing procedure, about 2.96% of the population would test positive. 


The proportion of motorists in a given gas station using regular unleaded gaso- 
line, extra unleaded, and premium unleaded over a specified period of time 
are 40%, 35%, and 25%, respectively. The respective proportions of filling their 
tanks are 30%, 50%, and 60%. What is the probability that a motorist selected 
at random from among the patrons of the gas station under consideration and 
for the specified period of time will fill his/her tank? 


DISCUSSION Denote by R, E, and P the events of a motorist using un- 
leaded gasoline which is regular, extra unleaded, and premium, respectively, 
and by F the event of having the tank filled. Then the translation into terms of 
probabilities of the proportions given above is: 


P(R) = 0.40, P(E) = 0.35, P(P) = 0.25, 
P(F | R) = 0.30, P(F | E) = 0.50, P(F | P) = 0.60. 
Then the required probability is: 
P(FP) = P(CFO RUF E)UEN P)) 
=P(FOR)+PROF)+ PFN P) 
= P(F | DPR) + PE | PPE) + P| P)P(P) 
= 0.30 x 0.40 + 0.50 x 0.35 + 0.60 x 0.25 


= 0.445. 


In reference to Theorem 4, stipulating the prior probabilities P(B | A;), j = 
1, 2, ..., is often a precarious thing and guesswork. This being the case, the 
question then arises of whether experimentation may lead to reevaluation of 
the prior probabilities on the basis of new evidence. To put it more formally, 
is it possible to use P(A;) and P(B|A;), 7 = 1,2, ... in order to calculate 
P(A; | B)? The answer to this question is in the affirmative, is quite simple, 
and is the content of the next result. 





(Bayes’ Formula) Let {A;, Az, ...} and B be as in the previous theorem. 
Then, for any j = 1, 2,...: 

P(B| Aj)P(Aj) 
D P(B|AJPCAD)' 





P(A; |B) = 
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PROOF Indeed, P(A; | B) = P(A; N B)/P(B) = P(B| A;)P(A;)/P(B), and 
then the previous theorem completes the proof. A 


The probabilities P(A; | B), j = 1, 2,..., are referred to as posterior prob- 
abilities in that they are reevaluations of the respective prior P(A;) after the 
event B has occurred. 


Referring to Example 19, a question of much importance is this: Given that the 
test shows positive, what is the probability that the patient actually has the 
disease? In terms of the notation adopted, this question becomes: P(D | +) =? 
Bayes’ formula gives: 


P(+|D)P() ml pid — ps) 
P(+| D)P(D)+PG|N)PW) p(l- ps) pe = pi)’ 


For the numerical values used above, we get: 


0.02 x 0.99 0.0198 198 
0.0296 0.0296 296 


So P(D | +) = 66.9%. This result is both reassuring and surprising. Reassuring, 
in that only 66.9% ofthose testing positive actually have the disease. Surprising, 
in that this proportion looks rather low, given that the test is quite good: it 
identifies correctly 99% of those having the disease. A reconciliation between 
these two seemingly contradictory aspects is as follows: The fact that P(D) = 
0.02 means that, on the average, 2 out of 100 persons have the disease. So, in 
100 persons, 2 will have the disease and 98 will not. When 100 such persons are 
tested, 2 x 0.99 = 1.98 will be correctly confirmed as positive (because 0.99 is 
the probability of a correct positive), and 98 x 0.01 = 0.98 will be incorrectly 
diagnosed as positive (because 0.01 is the probability of an incorrect positive). 
Thus, the proportion of correct positives is equal to: 





P(D|\+)= 


~ 0.669. 





P(D|+)= 


(correct positives) /(correct positives + incorrect positives) 


= 1.98/(1.98 + 0.98) = 1.98/2.96 = 198/296 ~ 0.669. 


REMARK 1 The fact that the probability P(D |+) is less than 1 simply 
reflects the fact that the test, no matter how good, is imperfect. Should the test 
be perfect (P(+ | D) = P(— | D°) = 1), then P(D | +) = 1, as follows from the 
preceding calculations, no matter what P(D) is. The same, of course, is true 
for P(D*|-—). 


Refer to Example 20 and calculate the probabilities: P(R| F), P(E | F), and 
P(P|F). 


DISCUSSION By Bayes’ formula and Example 20, 


_ PROF) PC |R)PCR) _ 0.30 x 0.40 _ 
da P(F) P(F) ~ 0.445 ` ni 
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and likewise, 


0.50 x 0.35 0.60 x 0.25 
—— 20. P(P | F) = — — 20.337. 
0.445 0.393, (P|F) 33 


RA 0.445 


3.1 If P(A| B) > P(A), then show that P(B| A) > P(B), by assuming that 
both P(A) and P(B) are positive. 


3.2 If AN B = Ø and P(AU B) > 0, express the probabilities P(A | AU B) 
and P(B | AU B) in terms of P(A) and P(B). 


3.3 A girls’ club has in its membership rolls the names of 50 girls with the 
following descriptions: 
20 blondes, 15 with blue eyes and 5 with brown eyes; 
25 brunettes, 5 with blue eyes and 20 with brown eyes; 
5 redheads, 1 with blue eyes and 4 with green eyes. 
If one arranges a blind date with a club member, what is the probability 
that: 
(i) The girl is blonde? 
(ii) The girl is blonde, if it was revealed only that she has blue eyes? 


3.4 Suppose that the probability that both of a pair of twins are boys is 0.30 
and that the probability that they are both girls is 0.26. Given that the 
probability of the first child being a boy is 0.52, what is the probability 
that: 

(i) The second twin is a boy, given that the first is a boy? 
(ii) The second twin is a girl, given that the first is a girl? 
(iii) The second twin is a boy? 
(iv) The first is a boy and the second is a girl? 


3.5 A shipment of 20 TV tubes contains 16 good tubes and 4 defective tubes. 
Three tubes are chosen successively and at random each time and are 
also tested successively. What is the probability that: 

(i) The third tube is good if the first two were found to be good? 
(ii) The third tube is defective if the first was found to be good and the 
second defective? 
(iii) The third tube is defective if the first was found to be defective and 
the second was found to be good? 
(iv) The third tube is defective if one of the other two was found to be 
good and the other was found to be defective? 


3.6 For any three events A, B, and C with P(A)P(B)P(C) > 0, show that: 
© P(A | B)=1-— P(A| B). 
Gi) PAUB|C)= P(A|C)+ P(B|C)— P(AN BC). 
Also, by means of counterexamples, show that the following equa- 
tions need not be true: 
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(iii) P(A| B®) = 1 — P(A|B). 
(iv) P(C | AUB) = P(C | A) + P(C | B), where AN B = Ø. 


3.7 If A, B, and C are any events in the sample space S, show that (4, 
ASO B, Af N BNC, (AU BUCY} is a partition of S. 


3.8 Use induction to prove Theorem 3. 


3.9 Let {4;, j = 1, ..., 5} be a partition of the sample space S and suppose 
that: 
J S=] 
P(A;) 15 and ( | j) 15 > J j > 5 
Compute the probabilities P(A; | 4), 7 =1,..., 5. 


3.10 A box contains 15 identical balls except that 10 are red and 5 are black. 
Four balls are drawn successively and without replacement. 
Calculate the probability that the first and the fourth balls are red. 


3.11 A box contains m + n identical balls except that m of them are red and 
n are black. A ball is drawn at random, its color is noticed, and then the 
ball is returned to the box along with r balls of the same color. Finally, a 
ball is drawn also at random. 
(i) What is the probability that the first ball is red? 
(ii) What is the probability that the second ball is red? 
(iii) Compare the probabilities in parts (i) and (ii) and comment on them. 
(iv) What is the probability that the first ball is black if the second is red? 
(v) Find the numerical values in parts (i), (ii), and (iv) if m = 9, n = 6, 
and r = 5. 


3.12 A test correctly identifies a disease D with probability 0.95 and wrongly 
diagnoses D with probability 0.01. From past experience, it is known 
that disease D occurs in a targeted population with frequency 0.2%. An 
individual is chosen at random from said population and is given the test. 
Calculate the probability that: 

(i) The test is +, P(+). 
(ii) The individual actually suffers from disease D if the test turns out to 
be positive, P(D | +). 


3.13 Suppose that the probability of correct diagnosis (either positive or neg- 
ative) of cervical cancer in the Pap test is 0.95 and that the proportion 
of women in a given population suffering from this disease is 0.01%. A 
woman is chosen at random from the target population and the test is 
administered. 
What is the probability that: 
(i) The test is positive? 
(ii) The subject actually has the disease, given that the diagnosis is 

positive? 


3.14 A signal S is sent from point A to point B and is received at B if both 
switches I and II are closed. It is assumed that the probabilities of I and 
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II being closed are 0.8 and 0.6, respectively, and that P(I is closed/I is 
closed) = P(II is closed). 

















Calculate the following probabilities: 
(i) The signal is received at B. 
(ii) The (conditional) probability that switch I was open, given that the 
signal was not received at B. 
(iii) The (conditional) probability that switch II was open, given that the 
signal was not received at B. 


3.15 The student body in a certain college consists of 55% women and 45% 
men. Women and men smoke cigarettes in the proportions of 20% and 
25%, respectively. If a student is chosen at random, calculate the proba- 
bility that: 
(i) The student is a smoker. 
(ii) The student is a man, given that he/she is a smoker. 


3.16 From a population consisting of 52% females and 48% males, an individ- 
ual, drawn at random, is found to be color blind. If we assume that the 
proportions of color-blind females and males are 25% and 5%, respec- 
tively, what is the probability that the individual drawn is a male? 


3.17 Drawers I and II contain black and red pencils as follows: 

Drawer I: bı black pencils and 7, red pencils, 

Drawer II: ba black pencils and 72 red pencils. 

A drawer is chosen at random and then a pencil is also chosen at random 

from that drawer. 

(i) What is the probability that the pencil is black? 

(ii) If it is announced that the pencil is black, what is the probability it 
was chosen from drawer I? 

(iii) Give numerical values in parts (i) and (ii) for bı = 36, rı = 12, b2 = 
60, 72 = 24. 


3.18 Three machines I, II, and II manufacture 30%, 30%, and 40%, respectively, 
of the total output of certain items. Of them, 4%, 3%, and 2%, respectively, 
are defective. One item is drawn at random from the total output and is 
tested. 

(i) What is the probability that the item is defective? 
(ii) If it is found to be defective, what is the probability the item was 
produced by machine I? 
(iii) Same question as in part (ii) for each one of the machines II and II. 
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3.19 Suppose that a multiple choice test lists n alternative answers of which 
only one is correct. If a student has done the homework, he/she is certain 
to identify the correct answer; otherwise the student chooses an answer 
at random. Denote by A the event that the student does the homework, 
set p = P(A), and let B be the event that he/she answers the question 
correctly. 

(i) Express the probability P(A | B) in terms of p and n. 
(ii) If 0 < p < 1 and fixed, show that the probability P(A | B), as a func- 
tion of n, is increasing. 
(iii) Does the result in part (ii) seem reasonable? 


3.20 If the p.d.f. of the r.v. X is: f(a) = Ae”, for x > 0 (A > 0), calculate: 
G) P(X >t) (for some t > 0). 
Gi) PX >s+t|X >s) (forsomes,t > 0). 
(iii) Compare the probabilities in parts (i) and (ii), and draw your 
conclusion. 


| 2.4 Independent Events and Related Results 


In Example 14, it was seen that P(A | B) = P(A). Thus, the fact that the event 
B occurred provides no information in reevaluating the probability of A. Under 
such a circumstance, it is only fitting to say that A is independent of B. For 
any two events A and B with P(B) > 0, we say that A is independent of B, if 
P(A| B) = P(A). If, in addition, P(A) > 0, then Bis also independent of A 
because 

P(BO A) P(ANB P(A|B)P(B) P(A)P(B) 

P(A) PA P(A) P(A) 

Because of this symmetry, we then say that A and B are independent. From 
the definition of either P(A | B) or P(B | A), it follows then that P(A N B) = 
P(A)P(B). We further observe that this relation is true even if one or both 


of P(A), P(B) are equal to 0. We take this relation as the defining relation of 
independence. 


P(B| A) = 





= P(B). 


DEFINITION 2 

Two events A, and Ax are said to be independent (statistically or stochas- 
tically or in the probability sense), if P(A; N A2) = P(A,)P(A2). When 
P(A; Ag) 4 P(A,)P(Ag2) they are said to be dependent. 


REMARK 2 At this point, it should be emphasized that disjointness and 
independence of two events are two distinct concepts; the former does not 
even require the concept of probability. Nevertheless, they are related in that, 
if Ay NM Az = Ø, then they are independent if and only if at least one of 
P(A,), P(4>) is equal to 0. Thus (subject to 4; N Az = Ø), P(A,)P(Az) > 0 
implies that A; and A» are definitely dependent. 
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The definition of independence extends to three events 4;, A2, A3, as well 
as to any number n of events Aj,..., An. Thus, three events A, A2, Az for 
which P(A; N A N A3) > 0 are said to be independent, if all conditional 
probabilities coincide with the respective (unconditional) probabilities: 


P(A, | Ag) = P(A, | As) = P(A1| 42 N 43) = P(A1) 
P(42| A1) = P(A2 | A3) = P(A2| Ai N A3) = P(42) 
P(A3 | 41) = P(A; | A2) = P(A | 41 N 42) = P(43) a) 
P(A; Ag | 43) = P(A: N 42), P(A: As | Az) 
= P(A N 43), P(42 N 43 | A1) = P(42 N As). 


From the definition of conditional probability, relations (1) are equivalent to: 


P(A, Ag) = P(A1)P(A2), P(A: N Az) = P(A1)P(As), 
P(A2N A3) = P(A2)P(A3), P(A1 N Az N A3) = PCA)P(A2)P(43). 


Furthermore, it is to be observed that relations (2) hold even if any of P(A;), 
P(43), P(A3) are equal to 0. These relations are taken as defining relations of 
independence of three events A;, A2, Az. 

As one would expect, all four relations (2) are needed for independence 
(that is, in order for them to imply relations (1)). That this is, indeed, the case 
is illustrated by the following examples. 


Let S = {1, 2, 3, 4} and let P({1}) = P({2}) = P({3}) = P({4}) = 1/4. Define 
the events Aj, 42, A3 by: A; = {1, 2}, Ao = {1, 3}, Az = {1, 4}. Then it is easily 
verified that: P(A; N A2) = P(A,)P(A2), P(A, N Az) = P(A;)P(A3), P(A2 N 
A3) = P(42)P(43). However, P(A; N A2 N Az) 4 P(AD)P(A2)P(43). 


Let S = (1,2, 3, 4, 5} and let P({1}) = $, P2) = PUB) = PUA = 3, 
P({5}) = 5- Define the events A;, Az, Az by: Aj = {1, 2, 3}, Ag = {1, 2, 4}, Az = 
(1, 3, 4}. Then it is easily verified that: P(A; N Az N A3) = P(A1)P(42)P (A3) 
but none of the other three relations in (2) is satisfied. 


Relations (2) provide the pattern of the definition of independence of n 
events. Thus: 


DEFINITION 3 

The events Aj, ..., A, are said to be independent (statistically or stochas- 
tically or in the probability sense) if, for all possible choices of k out of 
n events (2 < k < n), the probability of their intersection equals the 
product of their probabilities. More formally, for any k with 2 <k <n 
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and any integers j, ..., j with 1 < ji <---< jp < n, we have: 


r(A) = firan. (3) 


If at least one of the relations in (3) is violated, the events are said to be 
dependent. The number of relations of the form (3) required to express 
independence of n events is: 


DO 


For example, for n = 2, 3, these relations are: 22 — 2 — 1 = 1 and 2? — 
3 — 1 = 4, respectively. 


Typical cases where independent events occur are whenever we are sam- 
pling with replacement from finite populations, such as selecting successively 
and with replacement balls from an urn containing balls of several colors, 
pulling successively and with replacement playing cards out of a standard 
deck of such cards, and the like. 

The following property of independence of events is often used without 
even being acknowledged; it is stated here as a theorem. 





(i) If the events A;, Az are independent, then so are all three sets of 
events: Aj, Az; Aj, As; Aj, AS. 

(ii) More generally, if the events A;,..., A, are independent, then so are 
thelevents A ETA Where rA Astands ie entorn ao 
For illustrative purposes, we present the proof of part (i) only. 








PROOF OF PART (i) Clearly, Ay N AS = Aı — A1 N Ae. Thus, 
P(A, N 45) = P(A: — A N A2) = P(A) — P(A1 N Az) (Since Aj N Az C Ay) 
= P(A) — P(A,)P(Az2) (by independence of Aj, A2) 
= P(A))[1 — P(A2)] = P(AD)P(A3). 
The proof of P(AS N A2) = P(A‘) P(Az) is entirely symmetric. Finally, 
P(A¡N 4$) = P((A1UA2) (by DeMorgan’s laws) 
= 1 — P(A, U A2) 
= l1 — P(A) — P(A2)+ P(ALN Az) 
= 1 — P(A¡)- P(42)+ P(AD)PCA2) (byindependence of Aj, A2) 
= [1— P(A))] — P(42)11 — PA] 
= P(ADP(43). 4 
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The following examples will help illustrate concepts and results discussed 
in this section. 


Suppose that P(B)P(B°) > 0. Then the events A and B are independent if and 
only if P(A| B) = P(A| B°). 


DISCUSSION First, if A and B are independent, then A and B° are also 
independent, by Theorem 6. Thus, P(A| B°) = ay = “Oy = P(A). 
Since also P(A| B) = P(A), the equality P(A| B) = P(A|B% holas. Next, 
P(A|B) = P(A| B®) is equivalent to ea = ON or P(AN B)P(B*) = 
P(AN B°)P(B) or P(AN B)[1— P(B)] = P(AN B°)P(B) or P(ANB)-P(AN 
B)P(B) = PAN B°)P(B) or P(AN B) = [PAN B) + P(AN B®)P(B) = 
P(A)P(B), since (AN B)U (AN B°) = A. Thus, A and B are independent. 








REMARK 3 Itis to be pointed out that the condition P(A | B) = P(A| B°) 
for independence of the events A and B is quite natural, intuitively. It says that 
the (conditional) probability of A remains the same no matter which one of B 
or B° is given. 


Let P(C)P(C*) > 0. Then the inequalities P(A | C) > P(B|C)and P(A|C*) > 
P(B|C9 imply P(A) > P(B). 


DISCUSSION The inequalities P(A|C) > P(B|C) and P(A|C*) > 
P(B|C*) are equivalent to PCAN C) > P(BNC) and P(ANCS > P(BNC®). 
Adding up these inequalities, we obtain P(A N C)+ P(AN C°) > P(BA C)+ 
P(BNC or P(A) > P(B), since A = (AN C)U (AN C°) and B= (BN C) U 
(BN C*). 

REMARK 4 Once again, that the inequalities of the two conditional proba- 
bilities should imply the same inequality for the unconditional probabilities is 
quite obvious on intuitive grounds. The justification given above simply makes 
it rigorous. 


If the events A, B, and C are independent, then P(A U BUC) = 1- [1 — 
P(A) — PIL = P(C). 
DISCUSSION Clearly, 
P(AU BUC) = P[(A°N B°NC’)’] (by DeMorgan’s laws) 
=1-PA4£NBINCI (by basic property (3)) 
= 1 — P(AJ)P(BIP(CS) (by Theorem 6(ii) 
applied with n = 3) 
= 1- [1 -— P(A)][1 — Po — P(C). 


A mouse caught in a maze has to maneuver through three successive escape 
hatches in order to escape. If the hatches operate independently and the 
probabilities for the mouse to maneuver successfully through them are 0.6, 
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0.4, and 0.2, respectively, calculate the probabilities that the mouse: (i) will be 
able to escape, (ii) will not be able to escape. 


DISCUSSION Denote by Hı, Hz, and H; the events that the mouse suc- 
cessfully maneuvers through the three hatches, and by E the event that the 
mouse is able to escape. We have that Hı, Hz, and H; are independent, P(H,) = 
0.6, PU) = 0.4, and P(H3) = 0.2, and E = Hy N M N As. Then: (i) P(E) = 
P(H A H N M) = PA)PU)P(E) = 0.6 x 0.4 x 0.2 = 0.048, and (ii) 
P(E5 = 1 — P(E) = 1 — 0.048 = 0.952. 


The concept of independence carries over to random experiments. Al- 
though a technical definition of independence of random experiments is avail- 
able, we are not going to indulge in it. The concept of independence of random 
experiments will be taken in its intuitive sense, and somewhat more techni- 
cally, in the sense that random experiments are independent if they give rise 
to independent events associated with them. 

Finally, independence is also defined for r.v.’s. This topic will be taken up 
in Chapter 5 (see Definition 1 there). Actually, independence of r.v.’s is one of 
the founding blocks of most discussions taking place in this book. 


4.1 If P(A) = 0.4, P(B) = 0.2, and P(C) = 0.3, calculate the probability 
P(AU BU C), if the events A, B, and C are: 
(i) Pairwise disjoint. 
(ii) Independent. 

4.2 Show that the event A is independent of itself if and only if P(A) = 0 or 
P(A) =1. 


4.3 (i) For any two events A and B, show that P(AN B) > P(A)+ P(B)-1. 
(ii) If A and B are disjoint, then show that they are independent if and 
only if at least one of P(A) and P(B) is zero. 
(iii) Ifthe events A, B, and C are pairwise disjoint, under what conditions 
are they independent? 


4.4 Suppose that the events A;, 42, and B; are independent, the events Aj, As, 
and Bz are independent, and that Bı O Ba = Y. Then show that the events 
Ay, A2, Bı U By are independent. 


4.5 (i) If for the events A, B, and C, it so happens that P(A) = P(B) = 
PCC) = 5 P(ANB) = P(ANC) = P(BNC) = 5 and P(ANBNC) = 
a determine whether or not these events are independent. Justify 
your answer. 
(ii) For the values given in part (i), calculate the probabilities: P(A‘), 


P(AU B), P(4° N B°), P(AU BUC), and P(A'N B* NC"). 
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4.6 For the events A, B, C and their complements, suppose that: 


1 5 3 
P(ANBNC)= TE’ P(ANB"NC)= 16 P(ANBNCS)= 16 


, 


2 2 1 
PANBNO) == PANBNC)= iy PANBNC)=>3 


1 1 
PAN BNC) = FE, and PAN BEN) = ig 


(i) Calculate the probabilities: P(A), P(B), P(C). 

(11) Determine whether or not the events A, B, and C are independent. 
(iii) Calculate the (conditional) probability P(A | B). 
(iv) Determine whether or not the events A and B are independent. 


4.7 If the events A;, ..., A, are independent, show that 


n n 
»(Ua) =1-] [P(45). 
j=l j=l 
4.8 (i) Three coins, with probability of falling heads being p, are tossed once 
and you win, if all three coins show the same face (either all H or all 
T). What is the probability of winning? 
(ii) What are the numerical answers in part (i) for p = 0.5 and p = 0.4? 


4.9 Suppose that men and women are distributed in the freshman and sopho- 
more classes of a college according to the proportions listed in the fol- 
lowing table. 


Class\Gender M W Totals 


F 4 6 10 
S 6 pa 6+x 
Totals 10 6+x 16+x 


A student is chosen at random and let M, W, F, and S be the events, 
respectively, that the student is a man, a woman, a freshman, or a sopho- 
more. Then, being a man or a woman and being a freshman or sophomore 
are independent, if: 


PMOFP=PMPD,  P(WAF)= P(W)PC(P), 
P(M NAS) = P(M)P(S),  P(WNS)= P(W)P(S). 


Determine the number x, so that the preceding independence relations 
hold. 
4.10 The rv. X has p.d.f. given by: 
Cx, O<x<5b 
F0)=4c010—), 5<a< 10 
0, elsewhere. 
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(i) Determine the constant c. 
(ii) Draw the graph of f. 




















Define the events A and B by: A= (X > 5), B= (5 < X < 7.5). 
(iii) Calculate the probabilities P(A) and P(B). 
(iv) Calculate the conditional probability P(B | A). 
(v) Are the events A and B independent or not? Justify your answer. 


4.11 Three players J, I, ITI throw simultaneously three coins with respective 
probabilities of falling heads (H) pı, p2, and p3. A sample space describ- 
ing this experiment is: 


S = {HHH, HHT, HTH, THH, ATT, THT, TTH, TTT}. 
Define the events A;, i = 1, 2,3 and B by: 
A,={HTT,THH} A,={THT,HTH}, Az=(TTH, HAT} 


(i.e., the outcome for the ith player, i = 1, 2, 3, is different from those for 
the other two players), 


B= (HHH, TTT}. 


If any one of the events A;,7 = 1, 2, 3 occurs, the ith player wins and 
the game ends. If event B occurs, the game is repeated independently as 
many times as needed until one of the events A;, A2, Az occurs. 

(i) Calculate the probabilities: P(A;), 7 = 1, 2, 3. 

(ii) What do these probabilities become for pı = p2 = ps = p? 
(iii) What is the numerical value in part (ii) if p = 0.5? 


Hint: By symmetry, it suffices to calculate P(A;). Let Ay; = “event Ay 
occurs the jth time,” B; = “event B occurs the jth time.” Then (with 
slight abuse of notation) 


Ay = Ay U (B1 N A12) U (B1 A B2 N Ajg)U... 
At this point, also recall that: °°) a” = 1, lx] < 1. 


l-x? 


4.12 Jim takes the written and road driver's license tests repeatedly until he 
passes them. Itis given that the probability that he passes the written test 
is 0.9, that he passes the road testis 0.6, and that the tests are independent 
of each other. Furthermore, it is assumed that the road test cannot be 
taken unless he passes the written test, and that once he passes the 
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written test, he does not have to take it again ever, no matter whether he 
passes or fails his road tests. Also, it is assumed that the written and the 
road test are distinct attempts. 
(i) What is the probability that he will pass the road test on his nth 
attempt? 
(ii) What is the numerical value in part (i) for n = 5? 


Hint: Denote by W; and R; the events that Jim passes the written test 
and the road test the ith and jth time, respectively. Then the required 
event is expressed as follows: 


(WAREN... NORE N Rn) U (WEN Wen REN... NO RE gM Ruo) 


n— 


U...U (WEN... N Wi a N Wr Rn). 


4.13 The probability that a missile fired against a target is not intercepted 
by an antimissile missile is 2. If the missile is not intercepted, then the 
probability of a successful hit is 3. 

If four missiles are fired independently, what is the probability that: 

G) Al four will successfully hit the target? 
Gi) At least one will do so? 
Gii) What is the minimum number of missiles to be fired so that at least 
one is not intercepted with probability at least 0.95? 
Gv) What is the minimum number of missiles to be fired so that at least 
one hits the target with probability at least 0.99? 


4.14 Electric current is transmitted from point A to point B, provided at least 
one of the circuits #1 through #n here is closed. It is assumed that the n 
circuits close independently of each other and with respective probabil- 
ities pı, ..., Dn. 








Determine the following probabilities: 
G) No circuit is closed. 
Gi) At least one circuit is closed. 
Gii) Exactly one circuit is closed. 
(iv) How do the expressions in parts (i)-(iii) simplify if pı = --- = 
Pn = p? 
(v) What are the numerical values in part (iv) for n = 5 and p = 0.6? 
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4.15 Consider two urns U; and U> such that urn U, contains mı white balls 
and nı black balls, and urn U2 contains ma white balls and nə black balls. 
All balls are identical except for color. One ball is drawn at random from 
each of the urns U; and U2 and is placed into a third urn. Then a ball is 
drawn at random from the third urn. Compute the probability that the 
ball is: 

(i) Black; (ii) White. 
(iii) Give numerical answers to parts (i) and (ii) for: m, = 10, m = 15; 

Ma = 35, Na = 25. 





i 2.5 Basic Concepts and Results in Counting 


In this brief section, some basic concepts and results are discussed regarding 
the way of counting the total number of outcomes of an experiment or the total 
number of different ways we can carry out a task. Although many readers will, 
undoubtedly, be familiar with parts of or the entire material in this section, it 
would be advisable, nevertheless, to invest some time here in introducing and 
adopting some notation, establishing some basic results, and then using them 
in computing probabilities in the classical probability framework. 

Problems of counting arise in a great number of different situations. Here 
are some of them. In each one of these situations, we are asked to compute 
the number of different ways that something or other can be done. Here are a 
few illustrative cases. 


L EXAMPLE 29 | (i) Attire yourself by selecting a T-shirt, a pair of trousers, a pair of shoes, 


and a cap out of nı T-shirts, ng pairs of trousers, ng pairs of shoes, and 
N4 Caps (e.g., n = 4, Nna = 3, Ng = My = 2). 
(ii) Form all k-digit numbers by selecting the k digits out of n available num- 
bers (e.g., k = 2, n = 4 such as {1, 3, 5, 7}). 
(iii) Form all California automobile license plates by using one number, three 
letters and then three numbers in the prescribed order. 
(iv) Form all possible codes by using a given set of symbols (e.g., form all 
“words” of length 10 by using the digits 0 and 1). 
(v) Place k books on the shelf of a bookcase in all possible ways. 
(vi) Place the birthdays of k individuals in the 365 days of a year in all pos- 
sible ways. 
(vii) Place k letters into k addressed envelopes (one letter to each envelope). 
(viii) Count all possible outcomes when tossing k distinct dice. 
(ix) Select k cards out of a standard deck of playing cards (e.g., for k = 5, 
each selection is a poker hand). 
(x) Form all possible k-member committees out of n available individuals. 





The calculation of the numbers asked for in situations (i) through (x) just 
outlined is in actuality a simple application of the so-called fundamental prin- 
ciple of counting, stated next in the form of a theorem. 
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(Fundamental Principle of Counting) Suppose a task is completed in k 
stages by carrying out a number of subtasks in each one of the k stages. If 
the numbers of these subtasks are nı, . . . , ny for the k stages, respectively, 
then the total number of different ways the overall task is completed is: 
WO 2h POON ties 











Thus, in (i) above the number of different attires is: 4 x 3 x 2 x 2 = 48. 

In (ii), the number of all 2-digit numbers formed by using 1, 3, 5, 7 is: 
4x 4 = 16 (11, 13, 15, 17; 31, 33, 35, 37; 51, 53, 55, 57; 71, 73, 75, 77). 

In (ii), the number of all possible license plates (by using indiscriminately 
all 10 digits from 0 through 9 and all 26 letters of the English alphabet, although 
this is not the case in practice) is: 10 x (26 x 26 x 26) x (10 x 10 x 10) = 
175,760,000. 

In (iv), the number of all possible “words” is found by taking k = 10 and 
nı = ++: = wo = 2 to obtain: 21% = 1,024. 

In (v), all possible arrangements are obtained by taking n = k, na = k — 
l, ... ng = k — (k — 1) = 1 to get: k(k—1)...1=1...(k— Dk. For example, 
for k = 10, the number of arrangements is: 3,628,800. 

In (vi), the required number is obtained by taking nı = --- = Ny = 365 to 
get: 365". For example, for k = 3, we have 365° = 48,627,125. 

In (vii), the required number is: k(k — 1)...1 = 1...(k — k)k obtained by 
taking nı = k, m = k — 1, ..., ny =k — (k-1)=1. 

In (viii), the required number is: 6" obtained by taking m = --- = ny = 6. 
For example, for k = 3, we have 6? = 216, and for k = 10, we have 61% = 
60,466,176. 

In (ix), the number of poker hands is: —— = 2,598,960. The 
numerator is obtained by taking nı = 52, na = 51, ng = 50, n4 = 49, ns = 48. 
The division by 120 accounts for elimination of hands consisting of the same 
cards but drawn in different order. 

Finally, in (x), the required number is: 








ple HEL) 
-xk 


For example, for n = 10 and k = 3, we have: rc E = 120. 

In all ofthe situations (i) through (x), the required numbers were calculated 
by the appropriate application of Theorem 7. Furthermore, in many cases, as 
clearly exemplified by cases (ii), (iii), (v), (vii), (ix), and (x), the task performed 
consisted of selecting and arranging a number of objects out of a set of available 
objects. In so doing, the order in which the objects appear in the arrangement 
may be of significance, as is, indeed, the case in situations (ii), (iii), (iv), (v), 
(vi), and (vii), or it may be just irrelevant, as happens, for example, in cases 
Gx) and (x). This observation leads us to the concepts of permutations and 
combinations. More precisely, we have 


, by arguing as in (ix). 


DEFINITION 4 
An ordered arrangement of k objects taken from a set of n objects (1 < 
k < n) is a permutation of the n objects taken k at time. An unordered 
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arrangement of k objects taken from a set of n objects is a combination 
of the n objects taken k at a time. 


The question then arises of how many permutations and how many com- 
binations can be formed. The answer to this question is given next. 


COROLLARY (to Theorem 7) 


(i) The number of ordered arrangements of a set of n objects taken k at a time 
(1 < k < mis në when repetitions are allowed. When no repetitions are 
allowed, this number becomes the permutations of n objects taken k at a 
time, is denoted by P,, ,, and is given by: 


Par =n- 1)...(M-k+ 1). (4) 
In particular, for k = n, 
Pan =n"M-=1)...1=1...(M- m=, 


where the notation n! is read “n factorial.” 

(ii) The number of combinations (i.e., the number of unordered and without 
repetition arrangements) of n objects taken k at a time (1 < k < n) is 
denoted by (;’) and is given by: 


: 
Pr ! 
Ps (5) 
k) kL kim)! 


REMARK 5 Whether permutations or combinations are appropriate in a 
given problem follows from the nature of the problem. For instance, in (ii), 
permutations rather than combinations are appropriate as, e.g., 13 and 31 are 
distinct entities. The same is true of cases (111) (viii), whereas combinations 
are appropriate for cases (ix) and (x). 

As an example, in part (ii), Py2 = 4 x 3 = 12 (leave out the numbers with 
identical digits 11, 22, 33, and 44), and in part (ix), (C) = ist = 2,598,960, 
after cancellations and by carrying out the arithmetic. 








REMARK 6 In (5), set k = n. Then the left-hand side is, clearly, 1, and the 
right-hand side is “4 = ¿. In order for this to be 1, we define 0! = 1. From 
formula (5), it also follows that ($) = 1. 

This section is concluded with the justification of Theorem 7 and its corol- 
lary and some applications of these results in calculating certain probabilities. 


PROOF OF THEOREM 7 It is done by induction. For k = 2, all one has to do 
is to pair out each one of the nı ways of carrying out the subtask at stage 1 
with each one of the na ways of carrying out the subtask at stage 2 in order 
to obtain nı x ng for the number of ways of completing the task. Next, make 
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the induction hypothesis that the conclusion is true for k = mand establish it 
for k = m+ 1. So, in the first m stages, the total number of ways of doing the 
job is: Ni x --- X Nm, and there is still the final (m+ 1)st stage for completing 
the task. Clearly, all we have to do here is to combine each one of the nı x 

- X Nm ways of doing the job in the first m stages with each one of the nNm+ı 
ways of carrying out the subtask in the (m + 1)st stage to obtain the number 
NX +++ X Nm X Nm+ı Of completing the task. A 


PROOF OF THE COROLLARY 


(i) Here, we are forming an ordered arrangement of objects in k stages by 
selecting one object at each stage from among the n available objects 
(because repetitions are allowed). Thus, the theorem applies with n) = 

- = Nx = Nn and gives the result n*. When repetitions are not allowed, 
the only thing which changes from the case just considered is that: nı = 
nn2=n—-1,...,n,=n-(k-1)=n—k + 1, and formula (4) follows. 

(ii) Let a 7) be the mambet of combinations (unordered without repetition ar- 
rangements) of the n objects taken k at a time. From each one of these un- 
ordered arrangements, we obtain k! ordered arrangements by permutation 
of the k objects. Then k! x (7) is the total number of ordered arrangements 
of the n objects taken k at a time, which is Pax, by part (i). Solving for 
C); we obtain the first expression in (5). The second expression follows 
immediately by multiplying by (n — k)...1 and dividing by 1... (n— k) = 
(m-k! A 
There are many interesting variations and deeper results based on 

Theorem 7 and its corollary. Some of them may be found in Sections 2.4 and 
2.6 of Chapter 2 of the book A Course in Mathematical Statistics, 2nd edition 
(1997), Academic Press, by G.G. Roussas. 


It happens that 4 hotels in a certain large city have the same name, e.g., Grand 
Hotel. Four persons make an appointment to meet at the Grand Hotel. If each 
one of the 4 persons chooses the hotel at random, calculate the following 
probabilities: 

G) Al 4 choose the same hotel. 

Gi) All 4 choose different hotels. 


DISCUSSION 
(i) If A = “all 4 choose the same hotel,” then P(A) = AS, where n(A) is 
the number of sample points in A. Here, n(S) = 4x 4x 4x4 = 4, 
by Theorem 7 applied with k = 4 and m = n2 = ng = n4 = 4, and 
n(A) = 4, by Theorem 7 again applied with k = 1 (the 4 people looked 
upon as a single unity) and nı = 4 (the 4 hotels they can choose). Thus, 








P(A) = ġ = z = 4 = 0.015625 ~ 0.016. 
(ii) If B = “all 4 choose different hotels,” then, by the first pa of the corollary 
to Theorem 7, n(B) = P44 = 4!, so that P(B) = $ = B= y = 





0.09375 = 0.094. 
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Out of a set of 3 keys, only 1 opens a certain door. Someone tries the keys 
successively and let Az be the event that the right key appears the kth time. 
Calculate the probability P(A;): 


(i) If the keys tried are not replaced, k = 1, 2, 3. 
(ii) If the keys tried are replaced, k = 1, 2,.... 





DISCUSSION 
© PA) = $; P(A2) = 33 = $5 Ps) = 239 = 1, So, P(A1) = P(A2) = 
P(A3) = 3 ~ 0.333. 


(ii) Clearly, PAS) = P(W1 N -AO Wei O Re) = (6) x 3 forall k = 1, 2,... 


REMARK 7 To calculate the probabilities in part (i) in terms of conditional 
probabilities, set: R; = “the right key appears the kth time,” W, = “a wrong 
key appears the kth time,” k = 1, 2, 3. ier P(A) = P(Rı) = z , P(A2) = 
P(W,¡N Ro) = P(R2 | W)P(W,)= 5, and aS = PWA W» N R3) = 


P(R3 | W1 N W2)P(W2 | Wi)P(W1) x3=3 


1 
2 
= 3: 


2_ 
131° 

13 
The faculty in an academic department in UC-Davis consists of 4 assistant 
professors, 6 associate professors, and 5 full professors. Also, ithas 30 graduate 
students. An ad hoc committee of 5isto be formed to study a certain curricular 


matter. 


(i) What is the number of all possible committees consisting of faculty alone? 
(ii) How many committees can be formed if 2 graduate students are to be 
included and all academic ranks are to be represented? 
(iii) If the committee is to be formed at random, what is the probability that 
the faculty will not be represented? 


DISCUSSION _ It is clear that combinations are the appropriate tool here. 
Then we have: 


: : (ly 1B! _ 11x12x13x14x15 _ 
(i) This number is: (2) = gy = SSS 3,003. 


(ii) Here the number is: (J)O OG) = say x 4x 6x5 = 28? x 120 = 52,200. 


(iii) The required probability is: 
(5100) _ (3) _ 301/5125! 26 x 27x 28x 29x 30 2262 _ ae 


(5) — (5) 451/5140! 41 x 42 x 43x 44x 45 19,393 — 











What is the probability that a poker hand contains 4 pictures, including at least 
2 Jacks? It is recalled here that there are 12 pictures consisting of 4 Jacks, 4 
Queens, and 4 Kings. 


DISCUSSION A poker hand can be selected in E) ways. The event de- 
scribed, call it A, consists of the following number of sample points: n(A) = 
n(J) + n(B) + n(J4), where J; = “the poker hand contains exactly i Jacks,” 
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¿=2,3, 4. But 
4) (8) (40 4) (8) (40 4) (8) (40 
nd) = (;) ()( 1 ) nd) = (3) G 1 ) = (3) (o)( 1 ) 
so that 


E) ~ 2,598,960 





py LOO +OO+OOIY) _ s00 -oos 


(For the calculation of (*”) see Example 29(ix).) 


Each of the 2n members of a committee flips a fair coin in deciding whether 
or not to attend a meeting of the committee; a committee member attends the 
meeting if an H appears. What is the probability that a majority will show up 
for the meeting? 


DISCUSSION There will be majority if there are at least n + 1 committee 
members present, which amounts to having at least n+ 1 H’s in 2n independent 
throws of a fair coin. If X is the r.v. denoting the number of H’s in the 2n throws, 
then the required probability is: PX > n+ 1) = ae +1 P(X = x). However, 


Qn\ (IAN? 1 /2m 
P X = = = — = — 
azo- (03) G) Sæle) 
since there are (en) ways of having x H’s in 2n throws. Therefore 


rn È C) ALOE) 


x=n+1 x=0 


-a["-£0)]-- 20) 


x=0 x=0 


For example, for 2n = 10, P(X > 6) = 1 — 0.6230 = 0.377 (from the binomial 
tables). 


5.1 Telephone numbers at UC-Davis consist of 7-digit numbers the first 3 
of which are 752. It is estimated that about 15,000 different telephone 
numbers are needed to serve the university’s needs. 

Are there enough telephone numbers available for this purpose? Justify 
your answer. 


5.2 An experimenter is studying the effects of temperature, pressure, and 
a catalyst on the yield of a certain chemical reaction. Three different 
temperatures, four different pressures, and five different catalysts are 
under consideration. 
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(i) If any particular experimental run involves the use of a single tem- 
perature, pressure, and catalyst, how many experimental runs are 
possible? 

(ii) How many experimental runs are there that involve use of the lowest 
temperature and two lowest pressures? 

(111) How many experimental runs are possible if a specified catalyst is 
to be used? 


5.3 (1) Given that a zip code consists of a 5-digit number, where the digits 
are selected from among the numbers 0, 1,..., 9, calculate the 
number of all different zip codes. 

(ii) If X is the r.v. defined by: X(zip code) = # of nonzero digits in the 
zip code, which are the possible values of X? 
(iii) Give 3 zip codes and the respective values of X. 


5.4 How many 5-digit numbers can be formed by using the numbers 1, 2, 3, 
4, and 5, so that odd positions are occupied by odd numbers and even 
positions are occupied by even numbers, if: 

(i) Repetitions are allowed. 
(ii) Repetitions are not allowed. 


5.5 Form three-digit numbers by using the numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 
9, and satisfying one of the following requirements: 
(i) No restrictions are imposed. 
(ii) All three digits are distinct. 
(iii) All three-digit numbers start with 1 and end with 0. 
If the three-digit numbers are formed at random, calculate the prob- 
ability that such a number will be: 
(iv) As described in (ii). 
(v) As described in (iii). 


5.6 On a straight line, there are n spots to be filled in by either a dot or a dash. 
What is the number of the distinct groups of resulting symbols? What is 
this number if n = 5, 10, 15, 20, and 25? 


5.7 A child's set of blocks consists of 2 red, 4 blue, and 5 yellow cubes. The 
blocks can be distinguished only by color. If the child lines up the blocks 
in a row at random, calculate the following probabilities: 

(i) Red blocks appear at both ends. 
(ii) All yellow blocks are adjacent. 
(iii) Blue blocks appear at both ends. 


5.8 Suppose that the letters C, E, F, F, I, and O are written on six chips and 
placed into a box. Then the six chips are mixed and drawn one by one 
without replacement. What is the probability that the word “OFFICE” is 
formed? 


5.9 For any integers mand n with 0 < m < n, show that ("") = (p-m) either by 


n— 


calculation, or by using a suitable argument without writing out anything. 


5.10 Show that ("*1)/(”) = = 


Y m+l* 
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5.11 If M, N, and mare positive integers with m < M, show that: 


ita Get 
= ae 
m m m-— 1 
by recalling that (*) = 0 for x > k. 


5.12 Without any calculations and by recalling that (5) = 0 for x > k, show 


ON) 


5.13 The binomial expansion formula states that, for any x and y real and na 
positive integer: 


(0+ y" => (uy 


k=0 


Use this formula in order to show that: 


n m 2 n 
= 2” and =D*[ | =0. 
2, (+) ea sC) 
5.14 In the plane, there are n points such that no three of them lie on a straight 

line. How many triangles can be formed? What is this number for n = 10? 


5.15 Beethoven wrote 9 symphonies, Mozart wrote 27 piano concertos, and 
Schubert wrote 15 string quartets. 

G) Ifauniversity radio station announcer wishes to play first a Beethoven 
symphony, then a Mozart concerto, and then a Schubert string quar- 
tet, in how many ways can this be done? 

Gi) Whatis the number in part (i) if all possible orderings are considered? 


5.16 A course in English composition is taken by 10 freshmen, 15 sophomores, 
30 juniors, and 5 seniors. If 10 students are chosen at random, calculate 
the probability that this group will consist of 2 freshman, 3 sophomores, 
4 juniors, and 1 senior. 


5.17 If n countries exchange ambassadors, how many ambassadors are in- 
volved? What is this number for n = 10, 50, 100? 


5.18 From among n eligible draftees, mare to be drafted in such a way that all 
possible combinations are equally likely to occur. What is the probability 
that a specified man is not drafted? 


5.19 From 10 positive and 6 negative numbers, 3 numbers are chosen at ran- 
dom and without repetitions. What is the probability that their product 
is a negative number? 


5.20 Two people toss independently n times each a coin whose probability of 
falling heads is p. What is the probability that they have the same number 
of heads? What does this probability become for p = - and any n? Also, 
for p = 5 andn = 5? 


Exercises 67 


5.21 A shipment of 2,000 light bulbs contains 200 defective items and 1,800 
good items. Five hundred bulbs are chosen at random and are tested, 
and the entire shipment is rejected if more than 25 bulbs from among 
those tested are found to be defective. What is the probability that the 
shipment will be accepted? 


5.22 A student is given a test consisting of 30 questions. For each question, 5 
different answers (of which only one is correct) are supplied. The stu- 
dentis required to answer correctly at least 25 questions in order to pass 
the test. If he/she knows the right answers to the first 20 questions and 
chooses an answer to the remaining questions at random and indepen- 
dently of each other, what is the probability that the student will pass the 
test? 


5.23 Three cards are drawn at random and without replacement from a stan- 
dard deck of 52 playing cards. Compute the probabilities P(A;), i = 
1,..., 4, where the events A;, i = 1, ..., 4 are defined as follows: 

A, = “all 3 cards are black,” A> = “exactly 1 card is an ace,” 
A3 = “1 card is a diamond, 1 card is a heart, and 1 card is a club.” 
A4 = “at least 2 cards are red.” 


5.24 From an urn containing npg red balls, ng black balls, and nw white balls 
(all identical except for color) 3 balls are drawn at random. Calculate the 
following probabilities: 

(i) All 3 balls are red. 
(ii) At least one ball is red. 
(iii) One ball is red, 1 is black, and 1 is white. 
Do this when the balls are drawn: 
(a) Successively and with replacement; 
(b) Without replacement. 


5.25 A student committee of 12 people is to be formed from among 100 fresh- 
men (40 male + 60 female), 80 sophomores (30 male and 50 female), 70 
juniors (24 male and 46 female), and 40 seniors (12 male and 28 female). 
Calculate the following probabilities: 

(i) Seven students are female and 5 are male. 

(ii) The committee consists of the same number of students from each 
class. 

(iii) The committee consists of 2 female students and 1 male student 
from each class. 

(iv) The committee includes at least 1 senior (one of whom will serve as 
the chairperson of the committee). 
The following tabular form of the data facilitates the calculations 


Class\Gender Male Female Totals 


Freshman 40 60 100 
Sophomore 30 50 80 
Junior 24 46 70 
Senior 12 28 40 


Totals 106 184 290 
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In this chapter, we discuss the following material. In Section 3.1, the concepts 
of expectation and variance of a r.v. are introduced and interpretations are 
provided. Higher order moments are also defined and their significance is 
pointed out. Also, the moment generating function of a r.v. is defined, and 
its usefulness as a mathematical tool is commented upon. In Section 3.2, the 
Markov and Tchebichev inequalities are introduced and their role in estimating 
probabilities is explained. Section 3.3 is devoted to discussing some of the most 
commonly occurring distributions: They are the Binomial, Geometric, Poisson, 
Hypergeometric, Gamma (Negative Exponential and Chi-square), Normal, and 
Uniform distributions. In all cases, the mathematical expectation, variance, 
and the moment generating function involved are presented. The chapter is 
concluded with a discussion of the concepts of median and mode, which are 
illustrated by concrete examples. 


(l 3.1 Expectation, Variance, and Moment Generating Function of a Random Variable 
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The ideal situation in life would be to know with certainty what is going to 
happen next. This being almost never the case, the element of chance enters 
in all aspects of our life. A r.v. is a mathematical formulation of a random 
environment. Given that we have to deal with a r.v. X, the best thing to expect 
is to know the values of X and the probabilities with which these values are 
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taken on, for the case that X is discrete, or the probabilities with which X 
takes values in various subsets of the real line N when X is of the continuous 
type. That is, we would like to know the probability distribution of X. In real 
life, often, even this is not feasible. Instead, we are forced to settle for some 
numerical characteristics of the distribution of X. This line of arguments leads 
us to the concepts of the mathematical expectation and variance of a r.v., as 
well as to moments of higher order. 


DEFINITION 1 
Let X be a (discrete) r.v. taking on the values x; with corresponding 
probabilities f(x;), i = 1, ..., n. Then the mathematical expectation of 


X (or just expectation or mean value of X or just mean of X ) is denoted 
by EX and is defined by: 


EX => r f (ai). (1) 
i=l 


Ifthe r.v. X takes on (countably) infinite many values x; with correspond- 
ing probabilities f(x;), i = 1, 2, ..., then the expectation of X is defined 
by: 


00 00 
EX =) rif), provided Y lxil fŒ) < oo. (2) 

i=l i=l 
Finally, if the r.v. X is continuous with p.d.f. f, its expectation is defined 
by: 


00 
EX = | xfía)dx, provided this integral exists. (3) 
00 


The alternative notations u(X) or uy are also often used. 


REMARK 1 


(i) The condition 2, |x| f(a) < oo is needed because, if it is violated, it is 
known that 2, xi f (xi) may take on different values, depending on the 
order in which the terms involved are summed up. This, of course, would 
render the definition of EX meaningless. 

(ii) An example will be presented later on (see Exercise 1.16) where the inte- 
gral Pa (~)dx = co — oo, so that it does not exist. 


The expectation has several interpretations, some of which are illustrated 
by the following Examples 1 and 2. One basic interpretation, however, is that 
of center of gravity. Namely, if one considers the material system where mass 
f(x) is placed at the point x;, i = 1,...,n, then EX is the center of gravity 
(point of equilibrium) of this system. In this sense, EX is referred to as a 
measure of location of the distribution of X. The same interpretation holds 
when X takes on (countably) infinite many values or is of the continuous 


type. 
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Suppose an insurance company pays the amount of $1,000 for lost luggage on 
an airplane trip. From past experience, itis known that the company pays this 
amount in 1 out of 200 policies it sells. What premium should the company 
charge? 


DISCUSSION Define the r.v. X as follows: X = 0 if no loss occurs, which 
happens with probability 1— (1/200) = 0.995, and X = —1,000 with probability 
= = 0.005. Then the expected loss to the company is: EX = — 1,000 x 0.005 = 
—5. Thus, the company must charge $5 to break even. To this, it will normally 
add a reasonable amount for administrative expenses and a profit. 

Even in this simple example, but most certainly so in more complicated 
cases, it is convenient to present the values of a (discrete) r.v. and the corre- 


sponding probabilities in a tabular form as follows. 





x 0 | —1,000 Total | 


199 1 
FO | 200 1 

















A roulette wheel consists of 18 black slots, 18 red slots, and 2 green slots. If a 
gambler bets $10 on red, what is the gambler's expected gain or loss? 


DISCUSSION Define the r.v. X by: X = 10 with probability 18/38 and 
X = —10 with probability 20/38, or in a tabular form 




















x 10 | —10 | Total 
18 20 
Fx) 38 38 1 








Then EX = 10 x 3 — 10 x £ = — {i ~ —0.526. Thus, the gambler's expected 
loss is about 53 cents. 





From the definition of the expectation and familiar properties of summa- 
tions or integrals, it follows that: 


E(cX) = cEX, E(cX+d)=cEX+d, wherecanddare constants. (4) 
Also (see Exercise 1.18), 
X > cconstant, implies EX > c, and, in particlar, X > OimpliesEX>0. (5) 


Now if Y is a r.v. which is a function of X, Y = g(X), then, in principle, one 
may be able to determine the p.d.f. of Y and proceed to defining its expectation 
by the appropriate version of formulas (1), (2), (3). It can be shown, however, 
that this is not necessary. Instead, the expectation of Y is defined by using the 
p.d.f. of X, namely: 


00 


g(a) f(x) dx, 
(6) 


EY= ECO or EY= X gai) f (@i) or ev= f 
i=l i=l 


3.1 Expectation, Variance, and Moment Generating Function of a Random Variable 71 


under provisions similar to the ones mentioned in connection with (2) and (3). 
By taking g(x) = x*, where k is a positive integer, we obtain the kth moment 
of X: 


n 00 
EX'= > Je) or EX" = are or EX! = / 
i=l i=l 


> x" f(a) dx. 
(7) 


For k = 1, we revert to the expectation of X, and for k = 2, we get its second 
moment. Moments are important, among other things, in that, in certain cir- 
cumstances, a number of them completely determine the distribution of X. 
This will be illustrated by concrete cases in Section 3.3 (see also Remark 4). 

The following simple example illustrates that the expectation, as a mea- 
sure of location of the distribution, may reveal very little about the entire 
distribution. Indeed, let the r.v. X take on the values —1, 1, and 2 with cor- 
responding probabilities 5, i, and 5 so that EX = 0. Also, let the r.v. Y take 
on the values —10, 10, and 20 with respective probabilities 2, ¿, and 2; then 
again EY = 0. The distribution of X is over an interval of length 3, whereas 
the distribution of Y is over an interval of length 10 times as large. Yet, they 
have the same center of location. This simple example, clearly, indicates that 
the expectation by itself is not an adequate measure of description of a distri- 
bution, and an additional measure is needed to be associated with the spread 
of a distribution. Such a measure exists and is the variance of a r.v. or of its 
distribution. 


DEFINITION 2 
The variance of a r.v. X is denoted by Var(X ) and is defined by: 


Var(X) = E(X — EX Y. (8) 


The explicit expression of the right-hand side in (8) is taken from (6) for 
g(x) = (x — EX Y. The alternative notations o7(X) and o; are also often 
used for the Var(X). 


For the r.v.'s X and Y mentioned before, we have Var(X) = 1.75 and 
Var(Y) = 175. Thus, the variance does convey adequately the difference in 
size of the range of the distributions of the r.v.’s X and Y. More generally, for 


ar.v. X taking on finitely many values 1, ..., X, with respective probabilities 
Sf (a1), ..., fn), the variance is: Var(X) = X$; (xı — EX)? f(x;) and repre- 
sents the sum of the weighted squared distances of the points x;, i = 1,..., 


from the center of location of the distribution, EX. Thus, the further from 
EX the x;'s are located, the larger the variance, and vice versa. The same inter- 
pretation holds for the case that X takes on (countably) infinite many values 
or is of the continuous type. Because of this characteristic property of the vari- 
ance, the variance is referred to as a measure of dispersion of the underlying 
distribution. In mechanics, the variance is referred to as the moment of inertia. 

The positive square root of the Var(X) is called the standard deviation 
(s.d.) of X. Unlike the variance, the s.d. is measured in the same units 
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as X (and EX) and serves as a yardstick of measuring deviations of X 
from EX. 
From (8), (6), and familiar properties of summations and integrals, one 
obtains: 
Var(X) = EX? — (EX Y. (9) 
This formula often facilitates the actual calculation of the variance. From (8), 
it also follows immediately that 
Var(cX) = ° Var(X), Var(cX + d) = c? Var(X), 
where c and d are constants. (10) 
For a r.v. Y which is a function of X, Y = g(X), the calculation of the 
Var [9(X)] reduces to calculating expectations as in (6) because, by means of 
(8) and (9): 
Var[g(X)] = Var(Y ) = E(Y — EY Y = EY? — (EY Y =Eg*(X) — [Eg(X)}. 
(11) 
Formulas (8) and (9) are special cases of (11). 
In reference to Examples 1 and 2, the variances and the s.d.'s of the r.v.'s. 
involved are: o?(X) = 4,975, o(X) = 70.534, and o?(X) = 2% ~ 99.723, 
o(X ) ~ 9.986, respectively. 


Let X be a r.v. with p.d.f. f(x) = 3x”, 0 < x < 1. Then: 


(i) Calculate the quantities: EX, EX’, and Var(X). 
(ii) If the r.v. Y is defined by: Y = 3X — 2, calculate the EY and the Var(Y ). 


DISCUSSION 


(i) By (8), EX = fo x- 3x? dx = lat |, = 
with k = 2, EX? = fa? - 3x? da = 3 
0.60 — (0.75)? = 0.0375. 

(ii) By (4) and (6), EY = E(@X — 2) = 3EX — 2 = 3 x 0.75 — 2 = 0.25, whereas 
by (10), Var(Y) = Var(3X — 2) = 9 Var(X) = 9 x 0.0375 = 0.3375. 


0.75, whereas by (7), applied 


3 a 
2> 
= 0.60, so that, by (9), Var(X) = 


In (6), the EY was defined for Y = g(X), some function of X. In particular, 
we may take Y = e'* for an arbitrary but fixed t e ÑR. Assuming that there exist 
t’s in R for which Ee’ is finite, then this expectation defines a function in t. 
This function is denoted by M(t) and is called the moment generating function 
of X. That is, 


DEFINITION 3 
The function M(t) = Ee'*, defined for all those t in R for which Ee" is 
finite, is called the moment generating function (m.g.f.) of X. 


Sometimes the notation Mx(t) is also used to emphasize the fact that the 
m.g.f. under discussion is that of the r.v. X. The m.g.f. of any r.v. always exists 
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for t = 0, since Ee? = El = 1; it may exist only for t = 0, or for t in a proper 
subset (interval) in 9, or for every t in St. All these points will be demonstrated 
by concrete examples later on (see, for example, relations (20), (22), (24), (31), 
(33), (37), (44), and (46)). The following properties of M(t) follow immediately 
from its definition. 


Mex) = Mx(ct), Mex+at) = e" My(ct), where c and d are constants. 


(12) 
Indeed, 
Mi x(t) = Ec = Ee = My(ct), 
and 
Mill = EXPO = Ele” . e0] = e” Ee = e” Maxon). 
Under certain conditions, it is also true that: 
LETRO 4 = EX and SMa a = EX", n=2,3,.... (13) 








For example, for the first property, we have: 


= (Se) = 8( Se 
0 MAL t=0 dt 


= E(Xe'*|,_9) = EX. 


d 
—Mx(t 
E x(t) 











.) 


What is required for this derivation to be legitimate is that the order in which 
the operators 4 and E operate on e'% can be interchanged. The justification 
of the property in (13) for n > 2 is quite similar. On account of property (13), 
Mx(t) generates the moments of X through differentiation and evaluation of 
the derivatives at t = 0. It is from this property that the m.g.f. derives its name. 
The m.g.f. is also a valuable mathematical tool in many other cases, some 
of which will be dealt with in subsequent chapters. Presently, it suffices only 
to state one fundamental property of the m.g.f. in the form of a proposition. 


PROPOSITION 1 Under certain conditions, the m.g.f. Mx of a rv. X 
uniquely determines the distribution of X. 


This proposition is, actually, a rather deep probability result and it is re- 
ferred to as the inversion formula. 

Some forms of such a formula for characteristic functions, which are a 
version of a m.g.f. may be found, e.g., in pages 141-145 in A Course in Mathe- 
matical Statistics, 2nd edition (1997), Academic Press, by G. G. Roussas. 

Still another important result associated with m.g.f.'s is stated (but not 
proved) in the following proposition. 


PROPOSITION 2 Tf for the r.v. X all moments EX”, n= 1, 2,... are finite, 
then, under certain conditions, these moments uniquely determine the m.g.f. 
My of X, and hence (by Proposition 1) the distribution of X. 
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Exercise 3.49 provides an example of an application of the proposition just 
stated. 

For Examples 1 and 2, the m.g.f.’s of the r.v.'s involved are: Mx(t) = 
0.995 + 0.005e1.0001 ¢ € R, and Mx(t) = 0e + 10e~™), t e R. Then, 
by differentiation, we get: £ a Mx 0 = ape = ae et x(U)li=o = 5,000 = 
EX?, so that o*(X) = 4,975, and £Mx(Olizo = -£ = EX, L, Mx Oli- o = 
100 = EX?, so that o?(X) = 36.000 ~ 99.723. 


L EXAMPLE4 | Let X be a r.v. with p.d.f. f(x) = e~”, x > 0. Then: 


(i) Find the m.g.f. Mx(t) for the t’s for which it is finite. 
(ii) Using My, obtain the quantities: EX, EX’, and Var(X). 
(iii) Ifthe r.v. Y is defined by: Y = 2— 3X, determine My(t) for the ¢’s for which 
it is finite. 


DISCUSSION 
(i) By Definition 3, 


o0 00 
Mx(t) = Ee = f e” > e “dx = f e Ud dx 
0 0 


1 
= -— o (provided t 4 1) 





l-t 
1 . 
=e 70 D (provided 1 — t > Oort < 1). 
Thus, Mx(t) = © Ep t< 1. 


(i) By (13), ¿MxOli=o = Feo = gyl- = 1 = EX, E Mx(Oli=o = 
(E wlio = giy li=0 = 2 = EX’, so that, by (9), Var(X) = 2-1 = 1. 
(iii) By (12), My(t) = Mo_ax(t) = M_ax42() = e*Mx(—3t) = Ly 


1 
E provided t > —3. 





In several calculations required in solving some exercises in this section, the 
following formulas prove very useful. 
First, in summing the infinite terms of a geometric series, we have: 
00 tE 
is =p *=0,1..., and lt] <1. 
Next, 


2 ta 2a = ma 


=15( t ie t 
~ dg lst) Q- 
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Also, 


00 00 oOo da? 
(a DE =P ala Dt? =P Y" 4 
2 (x-1) 2 (a—1) er 


a= ay P 21 
=P Y) t=? a J= ! 
dt = dt \l-t di — #8 


In the last two formulas, if there is a number instead of the variable t, the 
number is replaced by ¢, and in the final formulas t is replaced by the number. 





1.1 Refer to Exercise 2.1 in Chapter 2 and calculate the quantities: EX, 
Var(X ), and the s.d. of X. 


1.2 For the rv. X for which P(X = —c) = P(X = c) = 1/2 (for some c > 0): 
(i) Calculate the EX and the Var(X). 
(ii) Show that P(X — EX| < c) = Var(X)/c?. 





1.3 A chemical company currently has in stock 100 lb of a certain chemical, 
which it sells to customers in 5 lb packages. Let X be the r.v. denoting 
the number of packages ordered by a randomly chosen customer, and 
suppose that the p.d.f. of X is given by: f(1) = 0.2, f(2)=0.4, f(8) = 0.3, 
f(4)=0.1. 





| xo 1 
| f(a) | 02 


2 3 4 
0.4 | 0.3 | 0.1 























G) Compute the following quantities: EX, EX?, and Var(X). 

Gi) Compute the expected number of pounds left after the order of the 
customer in question has been shipped, as well as the s.d. of the 
number of pounds around the expected value. 


1.4 Let X be a r.v. denoting the damage incurred (in $) in a certain type of 
accident during a given year, and suppose that the distribution of X is 
given by the following table: 





x 0 | 1,000 | 5,000 | 10,000 
fi) 108 | 0.1 | 008 | 0.02 


























A particular company offers a $500 deductible policy. If the company’s 
expected profit on a given policy is $100, what premium amount should 
it charge? 


1.5 Let X be the r.v. denoting the number in the uppermost side of a fair die 
when rolled once. 
(i) Determine the m.g.f. of X. 
(ii) Use the m.g.f. to calculate: EX, EX”, Var(X), and the s.d. of X. 


1.6 For any r.v. X, for which the EX and the EX? are finite, show that: 
Var(X) = EX? — (EX) = E[X(X — 1)] + EX — (EX Y. 
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1.7 Suppose that for a r.v. X it is given that: EX = 5 and E[X(X — 1)] = 27.5. 
Calculate: 
G) EX?. 
Gi) Var(X) and s.d. of X. 
1.8 For the r.v. X with p.d.f. f(x) = (1/2, x=1,2,...: 
(i) Calculate the EX and the E[X(X — 1)]. 
(ii) Use part (i) and Exercise 1.6 to compute the Var(X). 
1.9 The p.d.f. f of a rv. X is given by: f(x) = c(1/3)’, for x = 0,1,... 
(c some positive constant). 
(i) Calculate the EX. 
(ii) Determine the m.g.f. My of X and specify the range of its argument. 
(iii) Employ the m.g.f. in order to derive the EX. 


1.10 Forther.v. X with p.d.f. f(x) = 0.5a, for0 < x < 2, calculate: EX, Var(X), 
and the s.d. of X. 


1.11 If the rv. X has p.d.f. f(x) = 3x? — 2x + 1, for 0 < x < 1, compute the 
expectation and variance of X. 


1.12 If the rv. X has p.d.f. f given by: 


Gx, —2<x<0 
S(@) = 402%, O<x<l 
0, otherwise, 


and if we suppose that EX = i, determine the constants cı and c2. 


1.13 The lifetime in hours of electric tubes is a r.v. X with p.d.f. f given by: 
fa) = ¿xe*, for x > 0 (A > 0). Calculate the expected life of such 
tubes. 


1.14 Let X be a r.v. whose EX = u eN. Then: 
(i) For any constant c, show that: 


E(X — cf = E(X — wy + (u — 0}? = Var(X) + (u — 0)”. 


(ii) Use part (i) to show that E(X — cY, as a function of c, is minimized 


forc = pu. 
1.15 Let X be a r.v. with p.d.f. f(x) = a, for —c < x < c, c > 0. For any 
n= 1,2,..., calculate the EX”, and as a special case, derive the EX and 
the Var (X). 


1.16 Let X be a r.v. with p.d.f. given by: f(x) = + - 25, x e R. Show that: 


x 1412? 


(i) f is, indeed, a p.d.f. (called the Cauchy p.d.f.). 
(ii) [X xf (a) de = œ — oo, so that the EX does not exist. 


1.17 If X is a r.v. for which all moments EX”, n = 0, 1, ... are finite, show that 


Mx(t) = EX"), 
n=0 . 
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Hint: Use the expansion e” = > po 5. 


Remark: The result in this exercise says that the moments of a r.v. 
determine (under certain conditions) the m.g.f. of the r.v., and hence 
its distribution. 


1.18 Establish the inequalities stated in relation (5) for both the discrete and 
the continuous case. 


| 3.2 Some Probability Inequalities 


THEOREM 1 


If the r.v. X has a known p.d.f. f, then, in principle, we can calculate probabil- 
ities P(X € B) for B C NR. This, however, is easier said than done in practice. 
What one would be willing to settle for would be some suitable and computable 
bounds for such probabilities. This line of thought leads us to the inequalities 
discussed here. 





(i) For any nonnegative r.v. X and for any constant c > 0, it holds: 
P(X = ©) E EXC: 


(ii) More generally, for any nonnegative function of any r.v. X, g(X), 
and for any constant c > 0, it holds: 


Plg(X) > c] < Eg(X)/c. (14) 
(iii) By taking g(X ) = |X — EX] in part (ii), the inequality reduces to the 
Markov inequality, namely, 
P(X — EX|>c)= P(X — EX > c) < E|X — EX|"/c’, r>0. (15) 
(iv) In particular, for r = 2 in (15), we get the Tchebichev inequality, 
namely, 


E(X-EX o? 
Æ =E 





MA Z e) S or 


2 
P(\X — EX| <0) >1- a (16) 


where o? stands for the Var(X). Furthermore, if c = ko, where o is 
the s.d. of X, then: 


1 1 
AA A O) 








REMARK 2 From the last expression, it follows that X lies within k s.d.'s 
from its mean with probability at least 1 — 72) regardless of the distribution of 
X. It is in this sense that the s.d. is used as a yardstick of deviations of X from 
its mean, as already mentioned elsewhere. 
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Thus, for example, for k = 2, 3, we obtain, respectively: 


P(X-—EX|<20)>0.75,  P(X-—EX|<30)> : ~ 0.889. (18) 


PROOF OF THEOREM 1 Clearly, all one has to do is to justify (14) and 
this only for the case that X is continuous with p.d.f. f, because the discrete 
case is entirely analogous. 

Indeed, let A = {x € R; g(x) > c}, so that A° = {x e R; g(x) < c}. Then, 
clearly: 


ED=| gosaar= | ssd Sda 


> / g(x) f(a) dx (since g(x) > 0) 
A 


> ef S(a)dx (since g(x) > con A) 
A 


= cP(A) = cPlg(X) > cl. 
Solving for P[g(X) > c], we obtain the desired result. A 


Let the r.v. X take on the values —2, —1/2, 1/2, and 2 with respective prob- 
abilities 0.05, 0.45, 0.45, and 0.05. Then EX = 0 and o? = Var(X) = 0.625, so 
that 20 ~ 1.582. Then: P(|X| < 20) = P(—1.582 < X < 1.582) = P(X = 
—5)+ P(X = 3) = 0.90, compared with the lower bound of 0.75. 


Let the r.v. X take on the value x with probability f(x) = et, =D Lo 
some A > 0. As will be seen later on (see relation (23)), this is a Poisson r.v. with 
parameter à, and EX = Var(X) = i. For selected values of à, probabilities 
P(X < k) are given by the Poisson tables. For illustrative purposes, let 1 = 9. 
Then o = 3 and therefore: P(|X — 9| < 2x3) = P(8 < X < 15) = 0.9873, 
compared with 0.75, and P(|X — 9| < 3x3) = P(0 < X < 18) = 0.9946, 
compared with 0.889. 





2.1 Suppose the distribution of the r.v. X is given by the following table: 





a -1 0 1 
f(x) | 1118 | 89 | 1/18 























(i) Calculate the EX (call it 1), the Var(X), and the s.d. of X (call it o). 
(ii) Compute the probability: P(|X — u| > ko) for k = 2, 3. 
(iii) By the Tchebichev inequality: P(|X — u| > ko) < 1/k?. Compare the 
exact probabilities computed in part (ii) with the respective upper 
bounds. 
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2.2 If X is a r.v. with expectation jz and s.d. o, use the Tchebichev inequality: 
(i) To determine c in terms of o and a, so that: 


PIX—-u<O)>0%e  (0<a<l). 


(ii) Give the numerical value of c for o = 1 anda = 0.95. 


2.3 Let X be a rv. with p.d.f. f(x) = c(1 — x”), for —1 < x < 1. Refer to 
Exercise 2.11(i) in Chapter 2 for the determination of the constant c and 
then: 

(i) Calculate the EX and Var(X). 

(ii) Use the Tchebichev inequality to find a lower bound for the proba- 
bility P(—0.9 < X < 0.9), and compare it with the exact probability 
calculated in Exercise 2.11(11) in Chapter 2. 


2.4 Let X be a r.v. with (finite) mean yu and variance 0. Then: 
(i) Use the Tchebichev inequality to show that P(|X — u| > c) = 0 for all 


c>0. 
(ii) Use part (i) and Theorem 2 in Chapter 2 in order to conclude that 
P(X=pu)=1. 


| 3.3 Some Special Random Variables 


L. 3.3.1 The Discrete Case 


In this section, we discuss seven distributions — four discrete and three of the 
continuous type, which occur often. These are the Binomial, the Geometric, 
the Poisson, the Hypergeometric, the Gamma (which includes the Negative 
Exponential and the Chi-Square), the Normal, and the Uniform distributions. 
At this point, it should be mentioned that a p.d.f. is 0 for all the values of its 
argument not figuring in its definition. 


Binomial Distribution We first introduced the concept of a binomial ex- 
periment, which is meant to be an experiment resulting in two possible out- 
comes, one termed a success, denoted by S and occurring with probability 
p, and the other termed a failure, denoted by F and occurring with proba- 
bility q = 1 — p. A binomial experiment is performed n independent times 
(with p remaining the same), and let X be the r.v. denoting the number of 
successes. Then, clearly, X takes on the values 0, 1, ..., n, with the respective 
probabilities: 


Pzgr (r, x=0,1,....n0<p<l q=1-p. 
(19) 


The r.v. X is said to be Binomially distributed, its distribution is called Bino- 
mial with parameters n and p, and the fact that X is so distributed is denoted 
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by X ~ Bín, p). The graph of f depends on n and p; two typical cases, for 
n= 12, p= H, and n = 10, p = 5 are given in Figures 3.1 and 3.2. 


Values of the p.d.f. f of the B(12, 1) distribution 


Figure 3.1 


Graph of the p.d.f. of 
the Binomial 
Distribution for 
n=12,p= - 


f(0) = 0.0317 
f(1) = 0.1267 
f2) = 0.2323 
f(8) = 0.2581 
f(4) = 0.1936 
f) = 0.1032 
f(6) = 0.0401 


f(T) = 0.0115 
F(8) = 0.0024 
F(9) = 0.0004 

F(10) = 0.0000 

F(LL) = 0.0000 

f(12) = 0.0000 

















Values of the p.d.f. f of the B(0, 


For selected n and p, the d.f. F(k) = $“ (")p/q” is given by tables, 
the Binomial tables (see, however, Exercise 3.1). The individual probabilities 


(1) p/q”/ may be found by subtraction. Alternatively, such probabilities can 


f(0) = 0.0010 
f(1) = 0.0097 
f(2) = 0.0440 
f) = 0.1172 
f(4) = 0.2051 
$65) = 0.2460 





5) distribution 


f(6) = 0.2051 
f(T) = 0.1172 
Ff (8) = 0.0440 
F(9) = 0.0097 
f(10) = 0.0010 


j=0 


be calculated recursively (see Exercise 3.9). 
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Figure 3.2 


Graph of the p.d.f. of 
the Binomial 
Distribution for 
n=10, p= l 























For n = 1, the corresponding r.v. is known as the Bernoulli r.v. It is then 
clear that a B(n, p) r.v. X is the sum of n B(1, p) r.v.’s. More precisely, in ninde- 
pendent binomial experiments, associate with the ith performance of the ex- 
periment the r.v. X; defined by: X; = 1 ifthe outcome is S (a success) and X; = 0 
otherwise, i = 1,...,n. Then, clearly, )>;_, X; is the number of 1's in the n tri- 
als, or, equivalently, the number of S’s, which is exactly what the r.v. X stands 
for. Thus, X = > ;_, X;. Finally, it is mentioned here that, if X ~ B(n, p), then: 


EX=np, Var(X)=npq, and Mx(t)= (pe! +0”, teR. (20) 


The relevant derivations are left as exercises (see Exercises 3.10 and 3.11). 
A brief justification of formula (19) is as follows: Think of the n outcomes of 
the n experiments as n points on a straight line segment, where an S or an F, 
is to be placed. By independence, the probability that there will be exactly x 
S’s in x specified positions (and therefore n— x F’s in the remaining positions) 
is p*q” *, and this probability is independent of the locations where the x 
S’s occur. Because there are C) ways of selected x points for the S's, the 
conclusion follows. 

Finally, for illustrative purposes, refer to Example 7 in Chapter 1. In that 
example, clearly X ~ B(n, 0.8), and for the sake of specificity take n = 25, 
so that X takes on the values 0, 1, ..., 25. Next (see Exercise 3.1), (>) (0.8)” 
(0.2)>-" = () (0.2)¥(0.8)>-¥, where y = 25 — x. Therefore, for a = 15 and 
b = 20, for example, P(15 < X < 25) = Ns (7) )(0.2)%0.8)>-4 = 0.994 — 
0.421 = 0.573. Finally, EX = 25 x 0.8 = 20,Var(X) = 25 x 0.8 x 0.2 = 4, so 
that o (X) = 2. Examples 8-10 in Chapter 1 fit into the same framework. 


Geometric Distribution This distribution arises in a binomial experiment 
situation when trials are carried out independently (with constant probability 
p of an S) until the first S occurs. The r.v. X denoting the number of required 
trials is a Geometrically distributed r.v. with parameter p and its distribution 
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is the Geometric distribution with parameter p. It is clear that X takes on the 
values 1, 2, ... with the respective probabilities: 


P(X =x) = f(x) = pq", x=1,2,...,0<p<lq=1-p (21 


The justification of this formula is immediate because, if the first Sis to appear 
in the xth position, the overall outcome is FF... FS whose probability (by 
: se = 1 O 
independence) is q” + p. zi 

The graph of f depends on p; two typical cases for p = i and p = 5 are 
given in Figure 3.3. 


Values of f(x) = (0.25)(0.75)""!, Values of f(x) = (0.5)”, 


T= l2 wE T Byres 
FU) = 0.2500 FA) = 0.5000 
FO) = 0.1875 FO) = 0.2500 
fC) = 0.1406 fC) = 0.1250 
FU) = 0.1055 FA) = 0.0625 
FO) = 0.0791 FO) = 0.0313 
F(6) = 0.0593 F(6) = 0.0156 
FU) = 0.0445 FO = 0.0078 
F(S) = 0.0334 

FO) = 0.0250 


f(10) = 0.0188 


Figure 3.3 


Graphs of the p.d.f.'s 
of the Geometric 
Distribution with 
p=jandp= 




















If the rv. X is Geometrically distributed with parameter p, then: 


1 t 
i=. Vas Mx — 
p p 1— 


z? 73 t < —logg. (22) 
qe 
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REMARK3 Sometimes the p.d.f. of X is given in the form: f(x) = pq”, x = 
0,1,...;then EX = D Var(X) = a and Mx(t) = ae t < —loggq. 


In reference to Example 11 in Chapter 1, assume for mathematical con- 
venience that the number of cars passing by may be infinite. Then the r.v. 
X described there has the Geometric distribution with some p. Here prob- 
abilities are easily calculated. For example, P(X > 20) = XX pq! = 
p” A = pa” ii = q"; i.e., p(X > 20) = q”. For instance, if p = 0.01, 
then q = 0.99 and P(X > 20) = 0.826. 


Poisson Distribution A r.v. X taking on the values 0, 1, ... with respective 
probabilities given in (23) is said to have the Poisson distribution with pa- 
rameter i; its distribution is called the Poisson distribution with parameter 
à. That X is Poisson distributed with parameter A is denoted by X ~ P(A). 
AZ 
PX = 2) = f@)=e*—, x=0,1,...,A>0. (23) 
The graph of f depends on A; for example, for à = 5, it looks like that in 
Figure 3.4. That f is a p.d.f. follows from the formula > 2 “=e, 


x=0 x! 


Values of the p.d.f. f of the P(5) distribution 





F(0) = 0.0067 FO) = 0.0363 
FU) = 0.0337 f(10) = 0.0181 
FO) = 0.0843 f1) = 0.0082 
J) = 0.1403 f2) = 0.0035 
FA) = 0.1755 f3) = 0.0013 
F6) = 0.1755 F14) = 0.0005 
F(6) = 0.1462 f(15) = 0.0001 
FU) = 0.1044 
F(S) = 0.0653 f(@ is negligible for n > 16. 

Figure 3.4 

Graph of the p.d.f. 

of the Poisson 

Distribution with 

A=5 


CALETA 
123 45 67 8 9 1011 12 13 14 15 
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For selected values of i, the d.f. F(k) = aa 0 e ds given by tables, the 
Poisson tables. The individual values e7% + 2 are found by subtraction. Alterna- 
tively, such probabilities can be Gileuisiea recursively (see Exercise 3.20). It 
is not hard to see (see Exercises 3.21 and 3.22) that, if X ~ P(A), then: 


EX=), Va(X)=2, and My(t)=e?, ted. (24) 


From these expressions, the parameter A acquires a special meaning: it is both 
the mean and the variance of the r.v. X. 

Example 12 in Chapter 1 may serve as an illustration of the usage of the 
Poisson distribution. Assuming, for mathematical convenience, that the num- 
ber of bacteria may be infinite, then the Poisson distribution may be used to 
describe the actual distribution of bacteria (for a suitable value of 1) quite 
accurately. There is a host of similar cases for the description of which the 
Poisson distribution is appropriate. These include the number of telephone 
calls served by a certain telephone exchange center within a certain period of 
time, the number of particles emitted by a radioactive source within a certain 
period of time, the number of typographical errors in a book, etc. 

There is an intimate relationship between the Poisson and the Binomial 
distributions: the former may be obtained as the limit of the latter, as ex- 
plained in the following. Namely, it is seen (see Exercise 3.23) that in the 
Binomial, B(n, p), situation, if n is large and p is small, then the Binomial 
probabilities le S pd — p)”” are close to the Poisson probabilities e”? —— EL 
More precisely, (H) pA — Py" > eX, provided n > oo and Pp, > 0 
so that np, > A € (0, 00). Here pn is the probability of a success in the 
nth trial. Thus, for large values of n, (A) pz — py)" ~ e; or, upon re- 
placing A by npn, we obtain the approximation mentioned before. A rough 
explanation of why Poisson probabilities are approximated by Binomial prob- 
abilities is given next. To this end, suppose an event A occurs once in a 
small time interval h with approximate probability proportional to h and 
coefficient of proportionally A; i.e., A occurs once in k with approximate 
probability Ah. It occurs two or more times with probability approximately 
0, so that it occurs zero times with probability approximately 1 — Ah. 
Finally, occurrences in nonoverlapping intervals of length h are independent. 
Next, consider the unit interval (0, 1] and divide it into a large number n 
of nonoverlapping subintervals of equal length k: (ti—1, til, i = 1,...,n, to = 
Oir = Lh = E, With the ith interval (¢;_1, ti], associate the r.v. X; de- 
fined by: X; = 1 with approximate probability 1h and 0 with approximate 
probability 1 — Ah. Then the r.v. X = );_, X; denotes the number of oc- 
currences of A over the unit (0, 1] interval with approximate probabilities 
( AAA- Ah)". The exact probabilities are found by letting n > oo (which 
implies h > 0). Because here p, = Ah and np, = nàh = NA; 1 — à, we have 
that (JARA — Ahy? > e*+, as n > ov (by Exercise 3. 23), so that the 
exact probabilities are eh "So, the exact probability that A occurs x times 
in (0, 1] is the Poisson probability e”>%, and the approximate probability 


xl? 
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that A occurs the same number of times is the Binomial probability (") (Ah)” 
(1 — 1h)"; these two probabilities are close to each other for large n. 

The following example sheds some light on the approximation just 
discussed. 


If X is a r.v. distributed as B(25, $), we find from the Binomial tables that 
P(X = 2) = 0.2836. Next, considering a r.v. Y distributed as P(A) with A = 
E = 1.5625, we have that P(Y = 2) = e~ 1.5605 (1.5625 ~ 0.2556. Thus, the exact 
probability is underestimated by the amount 0.028. The error committed is of 
the order of 9.87%. Given the small value of n = 25, the approximate value is 


not bad at all. 


Hypergeometric Distribution This distribution occurs quite often and is 
suitable in describing situations of the following type: m identical objects 
(e.g., balls) are thoroughly mixed with n identical objects (which again can 
be thought of as being balls) but distinct from the m objects. From these 
m+n objects, r are drawn without replacement, and let X be the number 
among the r which come from the m objects. Then the r.v. X takes on the 
values 0, 1, ..., min(», m) with respective probabilities given below. Actually, 
by defining ("") = 0 for x > m, we have: 
liz) ead) 
PX =x)= f(x) = (mm) : T= Oey (25) 
be 

mand n may be referred to as the parameters of the distribution. By assuming 
that the selections of r objects out of the m + n are all equally likely, there 
are ("*”) ways of selecting these r objects, whereas there are (”) ways of 
selecting x out of the mobjects, and (oo) ways of selecting the remaining r—x 
objects out of n objects. Thus, the probability that X = x is as given in the 
preceding formula. The simple justification that these probabilities actually 
sum to 1 follows from Exercise 5.12 in Chapter 2. For large values of any 
one of m, n, and 7, actual calculation of the probabilities in (25) may be quite 
involved. A recursion formula (see Exercise 3.26) facilitates significantly these 
calculations. The calculation of the expectation and of the variance of X is 
based on the same ideas as those used in Exercise 3.10 in calculating the 
EX and Var(X) when X ~ B(n, p). We omit the details and give the relevant 
formulas, namely, 





mr mnr(m+.mn-—r) 
m+n ne (m+ n)?(m+n— 1) 








Finally, by utilizing ideas and arguments similar to those employed in Exer- 
cise 3.23, it is shown that as m and n > oo so that Paes > pe(0, 1), then 


(A (2/7) tends to (") pp" — py’. Thus, for large values of m and n, 


BS NNT "a 


the Hypergeometric probabilities (”)(„”,)/(”7”) may be approximated by the 


r—X, y 


simpler Binomial probabilities (C) PinnG — Pmn) ”, Where Pmn = an 
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As an application of formula (25) and the approximation discussed, take m = 
70, n= 90, r = 25 and x= 10. Then: 





70) / 90 70y /90 
(10) (25-10) ( )( ) 
Sao) = -morm = 7a, ~ 0.166, 
¡Es ) (25) 
after quite a few calculations. On the other hand, since ae = 168 = iG: the 
Binomial tables give for the B(25, 4) distribution: (S) ($) = 0.15. 


Therefore, the approximation overestimates the exact probability by the 
amount 0.016. The error committed is of the order of 10.7%. 


L_. 3.3.2 The Continuous Case 


Gamma Distribution For its introduction, a certain function, the so- 
called Gamma function, is to be defined first. It is shown that the integral 
ioe dy is finite for « > 0 and is thus defining a function (in œ), namely, 


T(a) = / y te” dy, a > 0. (26) 
0 


This is the Gamma function. By means of the Gamma function, the Gamma 
distribution is defined as follows: 


Sa) = Pap 


a and £ are the parameters of the distribution. That the function f integrates 
to 1 is an immediate consequence of the definition of T (œ). A rv. X taking on 
values in ÑK and having p.d.f. f, given in (27), is said to be Gamma distributed 
with parameters a and $; one may choose the notation X ~ I(a, £) to ex- 
press this fact. The graph of f depends on « and £ but is, of course, always 
concentrated on (0, co). Typical cases for several values of the pair (a, 6) are 
given in Figures 3.5 and 3.6. 

The Gamma distribution is suitable for describing waiting times between 
successive occurrences of a random event and is also used for describing 
survival times. In both cases, it provides great flexibility through its two pa- 
rameters a and £. For specific values of the pair (a, £), we obtain the Negative 





axele B, x>0, a>0, >00; (27) 


Figure 3.5 


Graphs of the p.d.f. 
of the Gamma 
Distribution for 
Several Values of 


a, B 
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Figure 3.6 


Graphs of the p.d.f. 
of the Gamma 
Distribution for 
Several Values of 


a, p 




















Exponential and Chi-Square distributions to be studied subsequently. By inte- 
gration by parts, one may derive the following useful recursive relation for the 
Gamma function (see Exercise 3.27): 


l(a) = (a — Drie — 1). (28) 


In particular, if a is an integer, repeated applications of recursive relation (28) 
produce 


MoH @— D —2)... Fd). 
But r(1) = fy e™” dy = 1, so that 
ro) = (@ — De -2...1 = <ñ (29) 


For later reference, we mention here (see also Exercise 3.45) that, by integra- 
tion, we obtain: 


r(z) sda, (30) 


and then, by means of this and the recursive formula (28), we can calculate 
rÈ), TG), etc. Finally, by integration (see Exercises 3.28 and 3.29), it is seen 
that: 


EX=af, Var(X)=08? and MO = Ge <3 (31) 


The lifetime of certain equipment is described by a r.v. X whose distribution 
is Gamma with parameters a = 2 and B = i, so that the corresponding p.d.f. 
is: f(a) = 9xe*”, for x > 0. Determine the expected lifetime, the variation 
around it, and the probability that the lifetime is at least 1 unit of time. 


DISCUSSION Since EX = af and Var(X) = af”, we have here: EX = 
and Var(X) = 2, Also, 


o0 ; 4 
P(X >1)= f 9xe™™ dx = — ~ 0.199. 
1 e 
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Negative Exponential Distribution In (27), seta = 1 and 6B = l (A > 0) 
to obtain: 


F(x) = re, x>0,1>0. (32) 


This is the so-called Negative Exponential distribution with parameter i. The 
graph of f(x) depends on A but, typically, looks as in Figure 3.7. 


Figure 3.7 


Graph of the 
Negative 
Exponential p.d.f. 
with A = 1 




















For ar.v. X having the Negative Exponential distribution with parameter i, 
formulas (31) give: 


1 1 À 
EX=-, Var(X)= >, and Mx(t)= T. t<À. (33) 


2? a t 
The expression EX = i provides special significance for the parameter i: its 
inverse value is the mean of X. This fact also suggests the reparameterization 
of f; namely, set + = u, in which case: 


f@@)=—-e%, x>0, EX=p, Va(X)=u?, and 


ele 


Mx(t)= => 


1 
: t<. (34) 
ut Y 


From (32), one finds by a simple integration: 
F(@) =1-e, x>0, sothat P(X > £) =e", x>0. (85) 


The Negative Exponential distribution is used routinely as a survival distri- 
bution; namely, as describing the lifetime of an equipment, etc., put in service 
at what may be termed as time zero. As such, it exhibits a lack of memory 
property, which may not be desirable in this context. Namely, if one poses the 
following question: What is the probability that an equipment will last for t 
additional units of time, given that it has already survived s units of time, the 
answer (by means of the Negative Exponential distribution) is, by (35): 


P(X >s+t,X >s) P(X>s+0 e> 
P(X >s) ~ P(X>s) e 





P(X >s+t|X>s)= 


e™ — P(X >t); 
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i.e., P(X>s+1t|X>s)= P(X > £) independent of s! Well, in real life, used 
pieces of equipment do not exactly behave as brand-new ones! Finally, itis to 
be mentioned that the Negative Exponential distribution is the waiting time 
distribution between the occurrence of any two successive events following 
the Poisson distribution (see also Exercise 3.20(ii) in Chapter 2). 


The lifetime of an automobile battery is described by a r.v. X having the Neg- 
ative Exponential distribution with parameter 1 = i. Then: 


(i) Determine the expected lifetime of the battery and the variation around 
this mean. 
(ii) Calculate the probability that the lifetime will be between 2 and 4 time 
units. 
(iii) If the battery has lasted for 3 time units, what is the (conditional) proba- 
bility that it will last for at least an additional time unit? 


DISCUSSION 


(i) Since EX = a and Var(X) = 34, we have here: EX = 3, Var(X) = 9, and 

S.d(X) = 

(ii) Since, by A F(x) = 1—e73 for x > 0, we have P(2 < x <4)= Pe < 
X<4)= PX <4)—- PX <2)= FA- FO = (1-e73)-(1-e7$)= 
e738 — e75 ~ 0.252. 

(iii) The required probability is: P(X > 4 | X > 3)= P(X > 1), by the mem- 
oryless property of this distribution, and P(X > 1)=1 — P(X < 1) = 
1— F(1) = e3 ~ 0.716. 


Chi-Square Distribution In formula (27) , set a = 5 for a positive integer 
r and $ = 2 to obtain: 


f@= “ae x>0, r > 0 integer. (36) 
The resulting distribution is known as the Chi-Square distribution with r de- 
grees of freedom (d.f.). This distribution is used in certain statistical inference 
problems involving confidence intervals for variances and testing hypotheses 
about variances. The notation used for ar.v. X having the Chi-Square distribu- 
tion with r d.f. is X ~ x?. For such a r.v., formulas (31) then become: 


EX=r, Var(X)=2r (both easy to remember) and 


1 1 
Mx) = ————; t< =. 37 
OET a (37) 
The shape of the graph of f depends on r, and, typically, looks like that in 
Figure 3.8. 
Later on (see Corollary to Theorem 5 in Chapter 5), it will be seen why r is 
referred to as the number of d.f. of the distribution. 


Normal Distribution This is by far the most important distribution, in 
both probability and statistics. The reason for this is twofold: First, many 
observations do follow to a very satisfactory degree a Normal distribution 
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Figure 3.8 


Graph of the p.d.f. of 
the Chi-Square 
Distribution for 
Several Values of r 




















Figure 3.9 


Graph of the p.d.f. 
of the Normal 
Distribution with 

p = 1.5 and Several 
Values of o 








Nu, 0?) 














(see, for instance, Examples 13-17 in Chapter 1); and second, no matter what 
the underlying distribution of observations is, the sum of sufficiently many 
observations behaves pretty much as ifit were normally distributed, under very 
mild conditions. This second property is referred to as normal approximation 
or as the Central Limit Theorem and will be revisited later on (see Section 7.2 
in Chapter 7). The p.d.f. of a Normal distribution is given by: 


1 2 2 
Foa RA, xen, WER, o >0; (38) 
210 


u and o? (or o) are referred to as the parameters of the distribution. The 
graph of f depends on y and o; typical cases for y = 1.5 and various values 
of o are given in Figure 3.9. 

No matter what u and o are, the curve representing f attains its maximum 
at x = u and this maximum is equal to 1/42 0, is symmetric around y (i.e., 
Fin — x) = f(u + x)), and f(x) tends to 0 as x > œ orx > —oo. All 
these observations follow immediately from formula (38). That the function 
f(x) integrates to 1 is seen through a technique involving a double integral 
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and polar coordinates (see Exercise 3.44). For u = 0 and o = 1, formula (38) 
is reduced to: 


1 2 
S= —=e*""?, xeR, (39) 
~V 2m 


and this is referred to as the standard Normal distribution (see Figure 3.10 
for its graph). 


Figure 3.10 


Graph of the p.d.f. of 
the standard Normal 
Distribution 























The fact that a r.v. X is Normally distributed with parameters u and o? 
(or o) is conveniently denoted by: X ~ N(u, o°). In particular, X ~ N(0, 1) 
for u = 0, o = 1. We often use the notation Z for a N(0, 1) distributed r.v. 

The d.f. of the N(0, 1)-distribution is usually denoted by ©; i.e., if 
Z ~ N(O, 1), then: 

P(Z < £) = D(x) = / Ñ Len dt xER (40) 
T -o0 V27 ? l 
Calculations of probabilities of the form P(a < X < b) for —co < a < b < œ 
are done through two steps: First, turn the r.v. X ~ N(u, 0?) into a N(0, 1)- 
distributed r.v., or, as we say, standardize it, and then use available tables, 
the Normal tables (see Proposition 3). The standardization is based on the 
following simple result. 


PROPOSITION 3 If X ~ N(u, 0?), then Z = Š is ~N(0, 1). 


PROOF Indeed, for y e K, 





X-pu 
Fey) = PZ =< = P( E <v) 


= P(X <p +0y) 


toy 
7 Lew pe? gy 


—o00 210 
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Set — = 2, so that t = u + oz with range from —oo to y, and dt = o dz, to 
obtain: 


FAY) J "= gg g 
= e o AZ 
á Y —oo Y 2110 
= IN l edz so that 
7 —oo J 20 , 
d 1 2 
= F, = — e” A 
Sz) a zY) Tar 


which is the p.d.f. of the N(0, 1) distribution. A 


Thus, if X ~ N(u, 0?) and a, b are as above, then: 


-u Me b- a b — 
Paa<x<v)=P(“ Bee £) o PZ< £) 
oO 

















Oo O O O 
a alee 
O oO 
That is, 
cr ace 
Pla<X<b=0 ® 5 (41) 
O o 


Any other probabilities (involving intervals) can be found by way of prob- 
ability (40) by exploiting the symmetry (around 0) of the N(0, 1) curve. 
Now, if Z ~ N(0, 1), it is clear that EZ?"+! = 0 for n = 0, 1,...; by 
integration by parts, the following recursive relation is also easily established: 
00 
May = (2N — 1)M2n-2, Where Mk = f EE 2 da, (42) 
—00 20 
from which it follows that EZ = 0 and EZ? = 1, so that Var(Z) = 1. (For 
details, see Exercise 3.48.) 
If X~ N(u, 07), then Z = E ~ N(0, 1), so that (by properties (9) and (10)): 


EX y 1 
0=EZ = — -—, 1=Var(Z)= Var(X), or EX=y and 
O O O 


Var(X) = o°. 
In other words: 
If X ~ N(u, o°), thenEX=u and Var(X) =o". (43) 


Thus, the parameters u and o? have specific interpretations: u is the mean of 
X and o? is its variance (so that o is its s.d.). 
If Z ~ N(0, 1), it is seen from the Normal tables that: 


P(-1 < Z < 1) = 0.68269  P(—2 < Z < 2) = 0.95450, 
P(-3 < Z < 3) = 0.99730, 


so that almost all of the probability mass lies within 3 standard deviations from 
the mean. The same is true, by means of formula (41), applied with a = u — ko 
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and b = u + ko with k = 1, 2, 3 in case X ~ N(u, 02). That is: 
P(u— o < X < u +o) = 0.68269, P(u — 20 < X < u + 20) = 0.95450, 
P(u — 30 < X < u + 30) = 0.99730. 


Finally, simple integration produces the m.g.f. of X (see also Exercise 3.46), 
namely, 
Mx(t) = ete /2, teñ, for X ~ N(u, 0?) 
(44) 
Mz(t) = et 2, ten, for Z~ NO, 1. 

As will be seen in subsequent chapters, the Normal distribution is widely 
used in problems of statistical inference, involving point estimation, interval 
estimation, and testing hypotheses. Some instances where the Normal distri- 
bution is assumed as an appropriate (approximate) underlying distribution are 
described in Examples 13-17 in Chapter 1, as mentioned already. 


Suppose that numerical grades in a statistics class are values of a r.v. X which 
is (approximately) Normally distributed with mean u = 65 and s.d. o = 15. 
Furthermore, suppose that letter grades are assigned according to the fol- 
lowing rule: the student receives an A if X > 85; Bif 70 < X < 85; Cif 
55 < X < 70; Dif 45 < X < 55; and F if X < 45. 


G) If a student is chosen at random from that class, calculate the probability 
that the student will earn a given letter grade. 
Gi) Identify the expected proportions of letter grades to be assigned. 


DISCUSSION 


(i) The student earns an A with probability P(X > 85) = 1 — P(X < 85) = 
1— PEL < %38) 2 1 P(Z < 1.34) ~ 1 — (1.34) = 1 — 0.909877 = 
0.090123 ~ 0.09. Likewise, the student earns a B with probability P(70 < 
X < 85) = P(T < = < 2) x P(0.34 < Z < 1.34) = 0(1.34) — 
9(0.34) = 0.909877 — 0.633072 = 0.276805 ~ 0.277. Similarly, the student 
earns a C with probability P(55 < X < 70) ~ ®(0.34) + 6(0.67) — 1 = 
0.381643 ~ 0.382. The student earns a D with probability P(45 < X < 
55) ~ (1.34) — (0.67) = 0.161306 ~ 0.161, and the student is assigned 
an F with probability P(X < 45) ~ ®(—1.34) = 1 — (1.34) = 0.09123 ~ 
0.091. 

(ii) The respective expected proportions for A, B, C, D, and F are: 9%, 28%, 
38%, 16%, and 9%. 











Indeed, suppose there are n students, and let X4 be the number of those 
whose numerical grades are >85. By assuming that the n events that the nu- 
merical grade of each one of the n students is >85 are independent, we have 
that Xa ~ Bn, 0.09). Then, ža is the proportion of A grades, and E(*4) = 
in x 0.09 = 0.09 = 9% is the expected proportion of A’s. Likewise for the other 
grades. 
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This section is concluded with a simple distribution, the Uniform (or Rect- 
angular) distribution. 


Uniform (or Rectangular) Distribution Such a distribution is restricted 
to finite intervals between the parameters a and B with —co <a < B < oo, 
and its p.d.f. is given by: 


f= p eses  (o<a<f<oh 5) 


Its graph is given in Figure 3.11, and it also justifies its name as rectangular. 


Figure 3.11 


Graph of the p.d.f. of 
the U(a, 8) 
Distribution 

















The term “uniform” is justified by the fact that intervals of equal length 
in (a, 6) are assigned the same probability regardless of their location. The 
notation used for such a distribution is U(a, £) (or R(a, B)), and the fact that 
the r.v. X is distributed as such is denoted by X ~ U(a, B) (or X ~ R(a, B)). 
Simple integrations give (see also Exercise 3.51): 

a+ (a — PY eft et 
EX = z7” Var(X )= Bm” and Mx(t)= GD tem. (46) 
A bus is supposed to arrive at a given bus stop at 10:00 a.m., but the actual time 
of arrival is ar.v. X which is uniformly distributed over the 16-minute interval 
from 9:52 to 10:08. If a passenger arrives at the bus stop at exactly 9:50, what is 
the probability that the passenger will board the bus no later than 10 minutes 
from the time of his/her arrival? 


DISCUSSION The p.d.f. of X is f(x) = 1/16 for x ranging between 9:52 
and 10:08, and 0 otherwise. The passenger will board the bus no later than 10 
minutes from the time of his/her arrival at the bus stop if the bus arrives at 
the bus stop between 9:52 and 10:00 (as the passenger will necessarily have 
to wait for 2 minutes, between 9:50 and 9:52). The probability for the bus to 
arrive between 9:52 and 10:00 is 8/16 = 0.5. This is the required probability. 


REMARK 4 _ It has been stated (see comments after relation (7)) that some- 
times a handful of moments of a r.v. X completely determine the distribution 
of X. Actually, this has been the case in all seven distributions examined 
in this section. In the Binomial distribution, knowledge of the mean amounts 
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to knowledge of p, and hence of f. The same is true in the Geometric distribu- 
tion, as well as in the Poisson distribution. In the Hypergeometric distribution, 
knowledge of the first two moments, or equivalently, of the mean and variance 
of X (see expressions for the expectation and variance), determine m and n 
and hence the distribution itself. The same is true of the Gamma distribution, 
as well as the Normal and Uniform distributions. 


3.1 If X ~ B(n, p) with p > 0.5, the Binomial tables (in this book) cannot 
be used, even if nis suitable. This problem is resolved by the following 
result. 

(1) If X ~ Bm, p), show that P(X = x) = P(Y = n— x), where Y ~ 
BM, q) (q = 1- p). 
(ii) Apply part G) for n = 20, p = 0.625, and x= 8. 


3.2 Let X be a r.v. distributed as B(n, p), and recall that P(X = x) = f(x) = 
Cra *,7= 0, 1, ....n@ =1— p). Set Bm p; £) = fŒ). 
(i) By using the relationship: (a) = (7) + (,-1) Gee Exercise 5.11 in 
Chapter 2), show that: 


BEn+1, p; x) = pBln, p; x — 1) + q B(n, p; £). 


(ii) By using this recursive relation of B(n + 1, p;.), calculate the proba- 
bilities B(n, p; x) for n = 26, p = 0.25, and x= 10. 


3.3 Someone buys one ticket in each one of 50 lotteries, and suppose that 
each ticket has probability 1/100 of winning a prize. Compute the prob- 
ability that the person in question will win a prize: 

G) Exactly once. 
Gi) At least once. 


3.4 Suppose that 15 people, chosen at random from a target population, are 
asked if they favor a certain proposal. If 43.75% of the target population 
favor the proposal, calculate the probability that: 

G) At least 5 of the 15 polled favor the proposal. 
Gi) A majority of those polled favor the proposal. 


3.5 A fair die is tossed independently 18 times, and the appearance of a 6 is 
called a success. Find the probability that: 
G) The number of successes is greater than the number of failures. 
Gi) The number of successes is twice as large as the number of failures. 
Gii) The number of failures is 3 times the number of successes. 


3.6 Suppose you are throwing darts at a target and that you hit the bull’s eye 
with probability p. It is assumed that the trials are independent and that 
p remains constant throughout. 
G) If you throw darts 100 times, what is the probability that you hit the 
bull's eye at least 40 times? 
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(ii) What does this expression become for p = 0.25? 
(iii) What is the expected number of hits, and what is the s.d. around this 
expected number? 


3.7 If X ~ B(100, 1/4), use the Tchebichev inequality to determine a lower 
bound for the probability: P(X — 25| < 10). 


3.8 A manufacturing process produces defective items at the constant (but 
unknown to us) proportion p. Suppose that n items are sampled inde- 
pendently and let X be the r.v. denoting the number of defective items 
among the n, so that X ~ B(n, p). Use the Tchebichev inequality in 
order to determine the smallest value of the sample size n, so that: 
P(É — p| < 0.05,/pq) = 0.95 (q = 1 — p). 

3.9 If X ~ Bn, p) show that f(x + 1) = y . arid (2), x=0,1,...,n— 1, 
(q =1- p). 

3.10 If X ~ B(n, p): 


(i) Calculate the EX and the E[X(X — 1)]. 
(ii) Use part (i) and Exercise 1.6 to calculate the Var(X). 





Hint: For part (i), observe that: 








n n(n— 1)! ” /n-1 , 
EX = Egri — x—l „(n—1)—x 
DA y(t a) 4 
z 
= mp Y ("| ») pig YY = mp, 
y=0 Y 
n ~1)(n— 2)! 
and E[X(X - = Y ate EDO pege 
x=2 


ax(x— Dx — 2)'(n — x)! 


nin- Dp? = Al 2 (dd 
p o) P "a 
x=2 v 


n—2 


-2 
= n(n—1)p” X (" 7 TOS Dp. 


y=0 


3.11 If X ~ B(n, p): 
(i) Show that Mx(t) = (pë + q), t E€ R (q = 1-— p). 
Gi) Use part (i) to rederive the EX and the Var(X). 


3.12 Let the r.v. X have the Geometric p.d.f. f(a) = pg", x = 1, 2, ... (q4 = 


1 — p). 
(i) What is the probability that the first success will occur by the 10th 
trial? 


(ii) What is the numerical value of this probability for p = 0.2? 


3.13 A manufacturing process produces defective items at the rate of 1%. Let X 
be the r.v. denoting the number of trials required until the first defective 
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item is produced. Then calculate the probability that X is not larger 
than 10. 


3.14 A fair die is tossed repeatedly until a six appears for the first time. Cal- 
culate the probability that: 
(i) This happens on the 3rd toss. 
(ii) At least 5 tosses will be needed. 


3.15 A coin with probability p of falling heads is tossed repeatedly and inde- 
pendently until the first head appears. 
(i) Determine the smallest number of tosses, n, required to have the first 
head appearing by the nth time with prescribed probability a. 
(ii) Determine the value of n for a = 0.95, and p = 0.25 (q = 0.75) and 


p = 0.50(=q). 
3.16 If X has the Geometric distribution; i.e., f(x) = pq% t, for x = 1,2,... 
(q =1- p) 


(i) Calculate the EX and the E[X(X — 1)]. 
(ii) Use part (i) and Exercise 1.6 to calculate the Var(X). 


Hint: Refer to the comments made just before the Exercises in 
Section 3.1 (of this chapter). 


3.17 If X has the Geometric distribution, then: 
(i) Derive the m.g.f. of X and specify the range of its argument. 
(ii) Employ the m.g.f. in order to derive: EX, EX”, and Var(X). 


3.18 Suppose that r.v. X is distributed as P(A); i.e., f(x) = ee, for x = 
0,1,..., and that f(2) = 2 f (0). Determine the value of the parameter A. 


3.19 Let X be a Poisson distributed r.v. with parameter 1, and suppose that 
P(X = 0) = 0.1. Calculate the probability P(X = 5). 


3.20 If X ~ P(A), show that: f(w+1) = 4, s(x), =0,1,.... 


3.21 If X ~ P(A): 
(i) Calculate the EX and the E[X(X — 1)]. 
(ii) Use part (i) and Exercise 1.6 to calculate the Var(X). 


Hint: For part (i), observe that: 
E ee ie X AY 
EX =e" N° 4——— = — =he“*e* =A, and 
2, x(x — 1)! 2, y! 
y=0 


gal 


qe œ AY 
= a2 —A A a2. 
ra- D-DD À 2 y! 





E[X(X - D] = ey x(x — 1) 


x=2 
3.22 If X ~ P(A): 


(i) Show that Mx(t) = e “=D, teM. 
(ii) Use the m.g.f. to rederive the EX and the Var(X). 


98 Chapter 3 Numerical Characteristics of a Random Variable 


3.23 Forn = 1,2,...,letther.v. Xn ~ B(n, Pn) where, asn > œ, 0 < Pn > 0, 
and np, > A € (0, co). Then show that: 


n 7 E pe 
(Juro (an = 1 — Pr) 
X xX 


Hint: Write (") as n(n — 1)... (n — x + 1)/x!, set np, = Àn, SO that 
i= An aoe 0 and qn = 1— pp = 1—- An 32 1. Group terms suitably, 


take the limit as n > 00, and use the calculus fact that (1 + =)" > e” 
when £n > £as n —> œ. 


3.24 In an undergraduate statistics class of 80, 10 of the students are, actually, 
graduate students. If 5 students are chosen at random from the class, 
what is the probability that: 

G) No graduate students are included? 
Gi) At least 3 undergraduate students are included? 


3.25 Suppose a geologist has collected 15 specimens of a certain rock, call it 
R¡, and 10 specimens of another rock, call it Rə. A laboratory assistant 
selects randomly 15 specimens for analysis, and let X be the r.v. denoting 
the number of specimens of rock R selected for analysis. 

(i) Specify the p.d.f. of the r.v. X. 
(ii) What is the probability that at least 10 specimens of the rock R; are 
included in the analysis? 
(iii) What is the probability that all specimens come from the rock R2? 


3.26 Ifthe r.v. X has the Hypergeometric distribution; i.e., P(X = x) = f(x) = 
ar, a=0,1,..., r, then show that: 


(Ce 7”) > 





(m-— Dr — x) 
x+1l= x). 
IC ) -rte Dar ) 
Hint: Start with f(x + 1) and write the numerator in terms of facto- 
rials. Then modify suitably some terms, and regroup them to arrive at 


the expression on the right-hand side. 





3.27 By using the definition of T (œ) by (26) and integrating by parts, show 
that: P(a) = (a — 1J («œ —1), a>. 


3.28 Letthe r.v. X have the Gamma distribution with parameters a and £. Then: 
(i) Show that: EX = aß, Var(X) = af?. 
(ii) As a special case of part (i), show that: If X has the Negative Expo- 
nential distribution with parameter i, then EX = t Var(X )= >. 
(iii) If X ~ x?, then EX =r, Var(X) = 2r. 


3.29 If the r.v. X is distributed as Gamma with parameters a and £, then: 
(i) Show that Mx(t) = 1/(1 — Bt), provided t < 1/8. 
(ii) Use the m.g.f. to rederive the EX and the Var(X). 


3.30 Let X be a r.v. denoting the lifetime of a certain component of a sys- 
tem, and suppose that X has the Negative Exponential distribution with 
parameter à. Also, let g(x) be the cost of operating this equipment to 
time X = x. 
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(i) Compute the expected cost of operation over the lifetime of the 
component under consideration, when: 
(a) g(x) = cx, where c is a positive constant, 
(b) g(x) = cd — 0.5e~*”), where a is a positive constant. 

(ii) Specify the numerical values in part (i) when A = 1/5,c = 2, and 
a = 0.2. 


3.31 If the rv. X has the Negative Exponential p.d.f. with parameter 1, calcu- 
late the failure rate r(x) defined by: r(x) = ; S on for x > 0, where F is 
the d.f. of X. 





3.32 Suppose that certain events occur in a time interval At according to the 
Poisson distribution with parameter At. Then show that the waiting time 
between any two such successive events is ar.v. T which has the Negative 
Exponential distribution with parameter à, by showing that P(T > t) = 
e“ t>0. 


3.33 Let X be the r.v. denoting the number of particles arriving independently 
at a detector at the average rate of 3 per second, and let Y be the r.v. denot- 
ing the waiting time between two successive arrivals. Refer to Exercise 
3.32 in order to calculate: 

(i) The probability that the first particle will arrive within 1 second. 

(ii) Given that we have waited for 1 second since the arrival of the last 
particle without a new arrival, what is the probability that we have 
to wait for at least another second? 


3.34 Let X be a rv. with p.d.£. f(x) = aBx?—e-*”, for x > 0 (where the 
parameters a and £ are > 0). This is the so-called Weibull distribution 
employed in describing the lifetime of living organisms or of mechanical 
systems. 

(i) Show that f is, indeed, a p.d.f. 
(ii) For what values of the parameters does f become a Negative 
Exponential p.d.f.? 
(iii) Calculate the quantities: EX, EX”, and Var(X). 


Hint: For part (i), observe that: />° apxhle-e dy = i gt x 
(opxP— da = — [dee = gan A EL 

For part (iii), set aa = t, so that x = tb Jalb, da = (t?!/Bal/ at 
and 0 < t < oo. Then: 


l ioe 
EX" = l ttd- edt, 
a/b 0 





Then multiply and divide by the constant rG + 1) and observe that 


ramt e (t > 0) is a Gamma p.d.f. with parameters 3 + land1. 
B 





3.35 In reference to Exercise 3.34, calculate: 
(i) The failure rate r(x) = re: x > 0, where F is the d.f. of the 
rv. X. 
(ii) The conditional probability PX >s+t|X >t), s>0, t>0. 
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(iii) Compare the results in parts (i) and (ii) with the respective results 
in Exercise 3.31 here, and Exercise 3.20(ii) in Chapter 2. 


3.36 If ® is the d.f. of the rv. Z ~ N(0, 1), show that: 
G) For0<a<b, Pa<Z<b)= Q(b) — (a). 
Gi) Fora<0=<b, P(a<Z=<b)= ®(-a)+ (6) — 1. 
(iii) Fora <b <0, P(a < Z < b) = P(-a) — P(-Db). 
Gv) Forc > 0, P(—c < Z < c) =20(c)- 1. 


3.37 If the r.v. Z ~ N(0, 1), use the Normal tables in the appendix to verify 
that: 
(i) P(-1 < Z < 1) = 0.68269. 
(ii) P(—2 < Z < 2) = 0.9545. 
(iii) P(E3 < Z < 3) = 0.9973. 


3.38 (i) Ifthe rv. X is distributed as N(u, o’), identify the constant c, in terms 
of u and o, for which: 


P(X <c)=2-9P(X >). 
Gi) What is the numerical value of c for y = 5 and o = 2? 


3.39 For any r.v. X with expectation u and variance o? (both finite), use the 
Tchebichev inequality to determine a lower bound for the probabilities: 
P(|X — p| < ko), for k = 1, 2, 3. Compare these bounds with the respec- 
tive probabilities when X ~ N(u, 0?) (see Exercise 3.37). 


3.40 The distribution of I.Q.’s of the people in a given group is approximated 
well by the Normal distribution with u = 105 and o = 20. What propor- 
tion of the individuals in the group in question has an I.Q.: 
G) At least 50? 
(11) At most 80? 
(iii) Between 95 and 125? 


3.41 A certain manufacturing process produces light bulbs whose life length 
(in hours) is a r.v. X distributed as Normal with = 2,000 and o = 200. 
A light bulb is supposed to be defective if its lifetime is less than 1,800. If 
25 light bulbs are tested, what is the probability that at most 15 of them 
are defective? (Use the required independence.) 


3.42 A manufacturing process produces 1/2-inch ball bearings, which are 
assumed to be satisfactory if their diameter lies in the interval 0.5 + 
0.0006 and defective otherwise. A day's production is examined, and it 
is found that the distribution of the actual diameters of the ball bearings 
is approximately Normal with yy = 0.5007 inch and ø = 0.0005 inch. 
What would you expect the proportion of defective ball bearings to be 
equal to? 


3.43 Let f be the p.d.f. of the N(u, 0?) distribution. Then show that: 
(i) f is symmetric about y. 


Gi) max cen fO = 1/4/20. 
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3.44 (i) Show that f(x) = Le", x e h,isap.d.f. 


(ii) Use part (i) in a to show that f(x) = 3 CER WE 


e 

210 
R, o > 0) is also a p.d.f. 

Hint: Set J = = Coe 5 da and show that I? = = 1, by writing 1? as 

a product of two integrals and then as a double integral; at this point, 

use polar coordinates: x = r cos 0, y= ro 0<r<w,0<6 <2z. 

Part (ii) is reduced to part (i) by letting — TE =y. 


3.45 Refer to the definition of T (œ) by (26) and show that rG) =N 


3.46 (i) If X ~ N(0, 1), show that Mx(t) = e”, te R. 
(ii) If X~ N(u, o?), use part (i) to show that Mx (t) = 
Gii) Employ the m.g.f. in part (ii) in order to show that EX = u and 
Var(X) = 0?. 


3.47 If the r.v. X has m.g.f. Mx(t) = e *6%, where a e R and £ > 0, identify 
the distribution of X. 


3.48 If X ~ N(O, 1), show that: 
@ Ex?! = 0 and EX” = £2%,n=0,1,... 
Gi) From part (i), derive that EX = 0 and Var(X) = 1. 
Gii) Employ part (ii) in order to show that, if X ~ N(u, o”, then EX = y 
and Var(X) = 0?. 








Hint: For part (i), that EX?”+! = 0 follows by the fact that the inte- 
grand is an odd function. For EX”, establish a recursive relation, inte- 
grating by parts, and then multiply out the resulting recursive relations 
to find an expression for EX”. The final form follows by simple manip- 
ulations. For part (iii), recall that X ~ N(u, o”) implies Es ~ N(0, 1). 


3.49 Let X be a r.v. with moments given by: 

(2n)! 
2r(—n!)’ 
(i) Use Exercise 1.17 in order to express the m.g.f. of X in terms of the 


moments given. 
(ii) From part (i) and Exercise 3.46(i), conclude that X ~ N(0, 1). 


3.50 If the r.v. X is distributed as U(—a, a) (a > 0), determine the parameter 
a, so that each of the following equalities holds: 
(i) P(-1 < X < 2) = 0.75. 
(ii) PAX] < 1) = P(|X| > 2). 


3.51 If X ~ U(a, B), show that EX = “44, Var(X) = €, D. 


3.52 If the r.v. X is distributed as U (0, 1), compute the AET 
(i) EBX? — 7X +2). 
(ii) E(2e*). 





Ex2+1 = 0, EX? = n= 0, L wae 
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E 3.4 Median and Mode of a Random Variable 


Although the mean of a r.v. X does specify the center of location of the dis- 
tribution of X, sometimes this is not what we actually wish to know. A case 
in point is the distribution of yearly income in a community (e.g., in a state 
or in a country). For the sake of illustration, consider the following (rather) 
extreme example. A community consisting of 100 households comprises 10 
households with yearly income $400,000 each and 90 households with yearly 
income $12,000 each. Defining the r.v. X to take the values 400,000 and 12,000 
with respective probabilities 0.10 and 0.90, we obtain: EX = 50,800. Thus, the 
average yearly income in this community would be $50,800, significantly above 
the national average yearly income, which would indicate a rather prosperous 
community. The reality, however, is that this community is highly stratified, 
and the expectation does not reveal this characteristic. What is more appro- 
priate for cases like this are numerical characteristics of a distribution known 
as median or, more generally, percentiles or quantiles. 

The median of the distribution of a r.v. X is usually defined as a point, 
denoted by Xo50, for which 


P(X < xo50) 2 0.50 and P(X > Xo50) = 0.50, (47) 
or, equivalently, 


If the underlying distribution is continuous, the median is (essentially) 
unique and may be simply defined by: 


P(X < %50) = P(X > Xo.50) = 0.50 (49) 


However, in the discrete case, relation (47) (or (48)) may not define the 
median in a unique manner, as the following example shows. 


L EXAMPLE 13 __| Examine the median of the r.v. X distributed as follows. 


x 1 2 3 4 5 6 7 8 9 10 
fo) | 2/32 | 1/32 | 5/32 | 3/32 | 4/32 | 1/32 | 2/32 | 6/32 | 2/32 | 6/32 












































DISCUSSION We have P(X<6)=16/32=0.50>0.50 and P(X > 6)= 
17/32 > 0.05 > 0.50, so that (47) is satisfied. Also, 


P(X < 7) = 18/32 > 0.50 > 0.50 and P(X>7)= 16/32 = 0.50 > 0.50, 


so that (47) is satisfied again. However, if we define the median as the point 
(6 + 7)/2 = 6.5, then P(X < 6.5) = P(X > 6.5) = 0.50, as (47) requires, and 
the median is uniquely defined. 

Relations (47)-(49) and Example 13 suggest the following definition of the 
median. 
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DEFINITION 4 

The median of the distribution of a continuous r.v. X is the (essentially) 
unique point x,5o defined by (49). For the discrete case, consider two 
cases: Let x% be the value for which P(X < %) = 0.50, if such a value 
exists. Then the unique median is defined to be the midpoint between 
M and X+1: i.e., Xo.50 = (ak + %+1)/2. If there is no such value, the 
unique median is defined by the relations: P(X < Xo50) < 0.50 and 
P(X < %0.50) > 0.50 (or P(X < £0.50) > 0.50 and P(X > £0.50) > 0.50). 


Thus, in Example 14, xo.50 = 6, because P(X < 6) = P(X < 5) = 15/32 < 
0.50 and P(X < 6) = 17/32 > 0.50. 


[__ EXAMPLE 14 | Determine the median of the r.v. X distributed as follows. 


x 1 2 3 4 5 6 7 8 9 10 
f(x) | 2/32 | 1/32 | 2/32 | 6/32 | 4/32 | 2/32 | 1/32 | 7/32 | 1/32 | 6/32 












































More generally, the pth quartile is defined as follows. 


DEFINITION 5 
For any p with 0 < p< 1, the pth quartile of the distribution of a r.v. X, 
denoted by x,, is defined as follows: If X is continuous, then the (essen- 
tially) unique x, is defined by: 

PX<4%)=p and P(X>x,)=1- p. 

For the discrete case, consider two cases: Let xy be the value for which 
P(X < %) = p, if such a value exists. Then the unique pth quantile 
is defined to be the midpoint between a and Xy+; 1.€., Xp = (% + 
%+1)/2. If there is no such value, the unique pth quantile is defined 
by the relation: P(X < xp) < p and P(X < xp) > p (or P(X < xp) > p and 
P(X > Xp)>1- p). 


Thus, the pth quantile is a point xp, which divides the distribution of X 
into two parts, and (— œ, Xp] contains exactly 100 p% (or at least 100 p%) of the 
distribution, and [x,, oo) contains exactly 100(1 — p)% (or at least 100(1 — p)%) 
of the distribution of X. For p = 0.50, we obtain the median. These concepts 
are illustrated further by the following examples. 


(STE Refer to Figure 3.1 (B(12, 1/4)) and determine 035, 2050, and 25. 


DISCUSSION Here xo25 = 2 since P(X < 2) = P(X = 0) + P(X = 1) = 
0.1584 < 0.25 and P(X < 2) = 0.1584 + P(X = 2) = 0.3907 > 0.25. Likewise, 
Xoso = 3 since P(X < 3) = 0.3907 < 0.50 and P(X < 3) = 0.6488 > 0.50. 
Finally, 70.75 = 4, since P(X < 4) = 0.6488 < 0.75 and P(X < 4) = 0.8424 > 0.75. 


(wee Refer to Figure 3.4 (P(5)) and determine X0.25, 0.50, and %0.75. 


As in the previous example, Xo.25 = 2, %o.50 = 4, and %o.75 = 6. 
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If X ~ U(0, 1), take p = 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, and 0.90 
and determine the corresponding xp. 


Here F(x) = i dt = x, 0 < x < 1. Therefore F(xp) = p gives Xp = P. 


If X ~ N(0, 1), take p as in the previous example and determine the corre- 
sponding xp. 


From the Normal tables, we obtain: 2.10 = —2%.9) = —1.282, 2.20 = — 20.80 = 
—0.842, 2 30 = —%.70 = —0.524, %.49 = —%.60 = —0.253, and 2% 50 = 0. 


Another numerical characteristic which helps shed some light on the distribu- 
tion of a r.v. X is the so-called mode. 


DEFINITION 6 
A mode of the distribution of a r.v. X is any point, if such points exist, 
which maximizes the p.d.f. of X, f. 


A mode, being defined as a maximizing point, is subject to all shortcomings 
of maximization: It may not exist at all; it may exist but is not obtainable in 
closed form; there may be more than one mode (the distribution is a multi- 
modal one). It may also happen that there is a unique mode (unimodal distri- 
bution). Clearly, if a mode exists, it will be of particular importance for discrete 
distributions, as the modes provide the values of the r.v. X which occur with 
the largest probability. With this in mind, we restrict ourselves to two of the 
most popular discrete distributions: the Binomial and the Poisson distribution. 





Let X be B(n, p); that is, 


Fa) = es 0<p<1,q=1-p, x=0,1,...,n. 
Consider the number (n+ 1)p and set m = [(n + 1)p], where [y] denotes 
the largest integer which is <y. Then, if (n+ 1)p is not an integer, f(x) 
has a unique mode at x = m. If (n+ Dp is an integer, then f(x) has two 
modes obtained for x = mand x = m— 1. 











PROOF For {x > 1, we have 





$0) E (") pq 
fl (PAGA 
me a el _ nu+l p 





= n! ¡—1pna+1 
ro x q 
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That is, 
FO)  n-ux+l DP 
faD «æ q 
Hence f(x) > f(x— 1) Cf is increasing) if and only if 





M-x+Dp>x(1-—D) ornp—-xp+p>x-—xp  or(n+l)p>x, and 
S(@~) = f(x— 1) ifand only if x = (n+ 1)p in case (n+ 1) pis an integer. 
Thus, if (n + 1)p is not an integer, we have: 
0<l<--*<m<(n+1D)p <m+l <- <n, 
and, by a slight abuse of notation, 
FU) = FU) <>< fm) > f(m+1) >> fn), 


so that there is a unique mode at m. If (n+ 1)p is an integer (=m), then we 
have: 


0 <1 <---<m-1<m<m+4+1 <... <n, 
and, as above, 
FO < fA <- < fm—-)D= fm > fm+1)>--- > fm, 


so that there are two modes at mandm—1. A 





Let X be P(A); that is, 
Nee 
JO =e" —) c= 01,202 2 0: 
Then, if A is not an integer, f(x) has a unique mode at x = [A]. If A is an 
integer, then f(x) has two modes obtained for x = à and x = i — 1. 








PROOF For «> 1, we have 
FU) eo" ja! A 


fau-D edita D! a 


Hence f(x) > f(x— 1) if and only if à > x, and f(x) = f(x — 1) if and only if 
x = iin case d is an integer. Thus, if à is not an integer, f(x) keeps increasing 
for x < [A] and then decreases. Thus the maximum of f(x) occurs at x = 
[A]. If A is an integer, then the maximum occurs at x = à. But in this case 
f(x) = f(x — 1), which implies that x = à — 1 is a second point which gives 
the maximum value to the p.df. A 





Let X ~ B(n, p) with n = 20 and p = i Then (n+ 1)p = a is not an integer 
and therefore there is a unique mode. Since a = 5.25, the mode is [5.25] = 5. 
The maximum probability is (“?)(0.25)°(0.75) = 0.2024. If n = 15 and p = |, 
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then (n+ 1)p = 16 = 4 and therefore there are two modes; they are 4 and 3. 
The respective maximum probability is ('?)(0.25)4(0.75)!! = 0.2252. 


Let X ~ P(A) and let à = 4.5. Then there is a unique mode which is [4.5] = 4. 
The respective maximum probability is 0.1898. If, on the other hand, à = 7, 
then there are two modes 7 and 6. The respective maximum probability is 
0.149. 


4.1 Let X be a r.v. with p.d.f. f(x) = 3x, for0 < x < 1. 
(i) Calculate the EX and the median of X and compare them. 
(ii) Determine the 0.125-quantile of X. 


4.2 Let X be a r.v. with p.d.f. f(x) = x”, for 0 < x < c (n positive integer), 
and let 0 < p < 1. Determine: 
(i) The pth quantile x, of X in terms of n and p. 
(ii) The median Xo.50 for n = 3. 


4.3 (i) Ifthe rv. X has p.d.f. f(x) = eò, for x > 0 (A > 0), determine the 
pth quantile x, in terms of A and p. 
Gi) What is the numerical value of x, for A = a and p = 0.25? 


4.4 Let X be a r.v. with p.d.f. f given by: 


Cx, -1<x<0 
f=- 0O<x<l 
0, otherwise. 


(i) If it is also given that EX = 0, determine the constants cı and c2. 
(ii) Determine the 4-quantile of the distribution. 
3 


4.5 Let X be a r.v. with d.f. F given in Exercise 2.2 of Chapter 2: 
(i) Determine the mode of the respective p.d.f. f. 
(ii) Show that 5 is the 3 = 0.15625-quantile of the distribution. 


4.6 Two fair and distinct dice are rolled once, and let X be the r.v. denoting 
the sum of the numbers shown, so that the possible values of X are: 
A A 2 
(i) Derive the p.d.f. f of the rv. X. 
(ii) Compute the EX. 
(iii) Find the median of f, as well as its mode. 


4.7 Determine the modes of the following p.d.f.’s: 
O f@= GY, r=1,2,.... 
Gi) fv) = (A-a), x = 1,2,...(0 < a < 1). Also, what is the value 
of a? 
ii) SA = gn, v=0,1,.... 
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4.8 Let X ~ B(100, 1/4) and suppose you were to bet on the observed value 
of X. On which value would you bet? 


4.9 In reference to Exercise 3.33, which number(s) of particles arrives within 
1 second with the maximum probability? 


4.10 Let X be a r.v. (of the continuous type) with p.d.f. f symmetric about 
a constant c (i.e., f(c — x) = f(c + x) for all x; in particular, if c = 0, 
then f(—2x) = f(x) for all x). Then show that c is the median of X. (As a 
by-product of it, we have, for example, that the mean y in the N(p, 0?) 
is also the median.) 


Hint: Start with P(X < c) = Es € f(x)dx and, by making a change 
of the variable x, show that this last integral equals de f(c — y)dy. 
Likewise, P(X > c) = SE Jf (x)dx and a change of the variable x leads 
to the integral de F(c+ y)dy. Then the use of symmetry completes the 
proof. 


4.11 Let X be ar.v. of the continuous type with p.d.f. f, with finite expectation, 
and median m, and let c be any constant. Then: 
(i) Show that: 


E|X — c| = E|X — m2 f (c — x) f(x) dx. 


(ii) Use part (i) to es that the constant c which minimizes the 
E|\|X—clisc= 


Hint: Form < c, show that: 


cm, x<m 
¡a—c|—|x—m| =0+m-2x, M<XxS<C 
m-—.cC, >t. 


Then 
EIX- c|- EIX -m = | c=msade+ f (c+m-— 2x) f(x) dx 


5 / m0 de 
= term f fode- f apaar 


+0m=o | an f(a) dex 


_ cm m- 
2 








f seoar- af xf(x) dx 


= feas — Df (o) dx. 
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For m > c, show that: 


c— m, x<cC 
jx — c| — |x — m| = { -c -m+ 2x, c<u<m 
m-c, x >m. 


Then 


E|X — c|- E| X — m| = Pe — MI) dx + Pe — m+ 2x) (0) dx 
+f on- c) f(x) dx 
=(- wf f(a) da — (e -m [ foar 
— (c+m) IN S(a)dx +2 Po dx 
+m- of Fi) dx 


c=M MC 


2 + 2 20 f scoda+2 | xf dx 





= -2 | -DSa = 2 f osea 


Combining the two results, we get 
C 
E| X — c| = E|X — m| + 2/ (c — x) f (x)dau. 
m 


4.12 Let X be a continuous r.v. with pth quantile x,, and let Y = g(X), where 
g is a strictly increasing function, so that the inverse g~! exists (and is 
also strictly increasing). Let y, be the pth quantile of the rv. Y. 
(i) Show that Y, = g(%p). 
(ii) If X has the Negative Exponential distribution with à = 1, calculate 


Tp: 
(iii) Use parts (i) and (ii) to determine y, without calculations, where 
Y =e*. 


(iv) What do parts (ii) and (iii) become for p = 0.5? 
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Joint and Conditional 
p.d.f.’s, Conditional 
Expectation and 
Variance, Moment 
Generating Function, 
Covariance, and 
Correlation Coefficient 


A brief description of the material discussed in this chapter is as follows. In 
the first section, two r.v.'s are considered and the concepts of their joint prob- 
ability distribution, joint d.f., and joint p.d.f. are defined. The basic properties 
of the joint d.f. are given, and a number of illustrative examples are provided. 
On the basis of a joint d.f., marginal d.f.’s are defined. Also, through a joint 
p.d.f., marginal and conditional p.d.f.'s are defined, and illustrative examples 
are supplied. By means of conditional p.d.f.'s, conditional expectations and 
conditional variances are defined and are applied to some examples. These 
things are done in the second section of the chapter. 

In the following section, the expectation is defined for a function of two 
r.v.’s and some basic properties are listed. As a special case, one obtains the 
joint m.g.f. of the r.v.’s involved, and from this, marginal m.g.f.'s are derived. 
Also, as a special case, one obtains the covariance and the correlation coef- 
ficient of two r.v.'s. Their significance is explained, and a basic inequality is 
established regarding the range of their values. Finally, a formula is provided 
for the calculation of the variance of the sum of two r.v.'s. 
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In the fourth section of the chapter, many of the concepts, defined for two 
r.v.’s in the previous sections, are generalized to k r.v.’s. In the final section, 
three specific multidimensional distributions are introduced, the Multinomial, 
the Bivariate (or 2-dimensional) Normal, and the Multivariate Normal. The 
derivation of marginal and conditional p.d.f.'s of the Multinomial and Bivariate 
Normal distributions is also presented. This section is concluded with a brief 
discussion of the Multivariate Normal distribution. 


i 4.1 Joint d.f. and Joint p.d.f. of Two Random Variables 


In carrying out a random experiment, we are often interested simultaneously 
in two outcomes rather than one. Then with each one of these outcomes ar.v. is 
associated, and thus we are furnished with two r.v.’s or a 2-dimensional ran- 
dom vector. Let us denote by (X, Y ) the two relevant r.v.'s or the 2-dimensional 
random vector. Here are some examples where two r.v.’s arise in a natural 
way. The pair of r.v.'s (X, Y ) denote, respectively: the SAT and GPA scores of 
a student chosen at random from a specified student population; the number 
of customers waiting for service in two lines in your local favorite bank; the 
days of a given year that the Dow Jones Averages closed with a gain and the 
corresponding gains; the number of hours a student spends daily for studying 
and for other activities; the weight and the height of an individual chosen at 
random from a targeted population; the amount of fertilizer used and the yield 
of a certain agricultural commodity; the lifetimes of two components used in 
an electronic system; the dosage of a drug used for treating a certain allergy 
and the number of days a patient enjoys relief. 

We are going to restrict ourselves to the case where both X and Y are 
either discrete or of the continuous type. The concepts of probability distri- 
bution, distribution function, and probability density function are defined by 
a straightforward generalization of the definition of these concepts in Section 
2.2 of Chapter 2. Thus, the joint probability distribution of (X, Y), to be de- 
noted by Px y, is defined by: Py y(B) = P[(X, Y) € B], B C R? = HR x R, 
the 2-dimensional Euclidean space, the plane. In particular, by taking B = 
(~œ, x] x (~œ, y], we obtain the joint d.f. of X, Y, to be denoted by Fx y; 
namely, Fx y(x, y) = P(X < x, Y < y), x, y € R. The df. Fx y has properties 
similar to the ones mentioned in the case of a single r.v., namely: 


1. 0 < Fx y(x, y) < 1 for all x, yen. 
Whereas it is, clearly, still true that xı < x2 and y < ya imply Fx y(x, 
Y) < Fx y(x», Y2), property #2 in the case of a single r.v. may be restated 
as follows: xı < %2 implies Fy(x2) — Fy(%) > 0. This property is replaced 
here by: 

2. The variation of Fx y over rectangles with sides parallel to the axes, given 
in Figure 4.1, is >0. 

3. Fx y is continuous from the right (right-continuous); i.e., if xn 4 x and 
Yn Y Y, then Fx y (Ln, Yn) > Fx, y(x, y) as n > 00. 


Table 4.1 
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4. Fx y(+00, +00) = 1 and Fy y(—00, —00) = Fx y(—o0, y) = Fx y(x, —00) = 
0 for any x, y € R, where, of course, Fx y(+00, +00) is defined to be the 
lim»->00 Fx, Y (Ln, Yn) as Xp f oo and y, $ oo, and similarly for the remaining 
cases. 


Figure 4.1 


The Variation V of 
Fx, y over the l (1, Ya) 
Rectangle Is: 
Fy, y(%1, Y) + 
Fx, y(x2, Y2) — 
Fy, y(x1, y2) — 
Fx, y(%2, Yi) | Ga 








| 
vy 

















Property #1 is immediate, and property #2 follows by the fact that the variation 
of Fx y as described is simply the probability that the pair (X, Y) lies in the 
rectangle of Figure 4.1, or, more precisely, the probability Pa, < X < x2, Y < 
Y < ya), which, of course, is > 0; the justification of properties #3 and #4 is 
based on Theorem 2 in Chapter 2. 

Now, suppose that the r.v.’s X and Y are discrete and take on the values 2; 
and y;, j > 1, respectively. Then the joint p.d.f. of X and Y, to be denoted by 
Sx,y, is defined by: fx y(x;, yj) = P(X = xj, Y = yj) and fx y(x, y) = 0 when 
(x, Y) F (xj, yj) Ge., at least one of x or yis not equal to x; or y;, respectively). 
It is then immediate that for B C R?, P[(X, Y) e B] = 2 aoe Fx y(&j, Yi), 
and, in particular, A yen? Fxr(%;, yj) = 1, and Fy y(x, y) = de yz 
Jx, y(&j, Yj). Inthe last relation, Fy y is expressed in terms of fx, y. The converse 
is also possible (as was done in the case of a single r.v.), but we do not intend 
to indulge in it. A simple illustrative example, however, may be in order. 





Each one of the r.v.'s X and Y takes on four values only, 0, 1, 2, 3, with joint 
probabilities expressed best in a matrix form as in Table 4.1. 





yx 0 1 2 3 Totals 
0 0.05 0.21 0 0 0.26 
1 0.20 0.26 0.08 0 0.54 
2 0 0.06 0.07 0.02 0.15 
3 0 0 0.03 0.02 0.05 
Totals 0.25 0.53 0.18 0.04 1 











DISCUSSION The r.v.’s X and Y may represent, for instance, the number 
of customers waiting for service in two lines in a bank. Then, for example, 
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for (x, y) with x = 2 and y = 1, we have Fy y(x, y) = Fx rQ, 1) = ee 
Jx y, v) = fx,y (0,0) + fx r(0, D + fxvd,0+ fxv, D + fxrQ, 0) + 
Fx, rQ, 1) = 0.05 + 0.20 + 0.21 + 0.26 + 0 + 0.08 = 0.80; also, PQ < X < 
3,0<Y<2=fr,0+ fxr, D+ Jr, 2) + 418,0 + for08, D+ 
Fx, y (3, 2) = 0+ 0.08 + 0.07 + 0 + 0 + 0.02 = 0.17. 

Now, suppose that both X and Y are of the continuous type, and, indeed, 
a little bit more; namely, there exists a nonnegative function fx y defined on 
R? such that, for all x and yin R: Fy ræ, Y = 2. Sxr(s, O ds dt. Then 
for B C N? (interpret B as a familiar geometric figure in R?): P[(X, Y) e 
Bl = fg J fxr(x, y)dady, and, in particular, f% /% fx,r(x, y dxdy = 1. 
The function fx y is called the Joint p. d.f. of X and Y. Analogously to the 
case of asingle r.v., the relationship zy xy, Y) = fx, y(x, y) holds true (for 
continuity points (x, y) of fx y), so that not only does the joint p.d.f. determine 
the joint d.f. through an integration process, but the converse is also true; i.e., 
the joint d.f. determines the joint p.d.f. through differentiation. Again, as in the 
case of a single r.v., P(X = x, Y = y) = 0 for all x, y e Ñ; also, if a nonnegative 
function f, defined on %?, integrates to 1, then there exist two r.v.’s X and Y 
for which f is their joint p.d.f. 

This section is concluded with a reference to Example 37 in Chapter 1 
where two continuous r.v.’s X and Y arise in a natural manner. Later on (see 
Subsection 4.5.2), it may be stipulated that the joint distribution of X and Y 
is the Bivariate Normal. For the sake of a simpler illustration, consider the 
following example. 


Let the r.v's X and Y have the joint pdf. fxy(x, y) = AyAge 4749, 
x,y > 0, 41,42 > 0. For example, X and Y may represent the lifetimes of 
two components in an electronic system. Derive the joint d.f. Fx y. 


DISCUSSION The corresponding joint d.f. is: Fx y(x, y) = fy fo Aide x 
esa ds dt = fY Ace tf Me™ ds)dt = [Age (1 — emi = (1 — 
e mn — e~*2”) for x > 0, y > 0, and 0 otherwise. That is, 


Fgy(a, y = Q- ead- e>), x>0,y>0, 
and Fy, y(x, y) = 0 otherwise. (1) 
By letting x and y > ov, we obtain Fx y(oo, 00) = 1, which also shows that 


fx,y, as given above, is, indeed, a p.d.f., since Fy y(00, 00) = fo fo Mà x 
e*s—A2l ds dt, 


It is claimed that the function Fx y given by: Fyy = ¿ay(a+y,0<xw<2, 
0 < y < 2, is the joint d.f. of the r.v.'s X and Y. Then: 


(i) Verify that Fx y is, indeed, a d.f. 

(ii) Determine the corresponding joint p.d.f. fx y. 
Gii) Verify that fx y found in part (ii) is, indeed, a p.d.f. 
(iv) Calculate the probability: P(0 < X < 1, 1 < Y < 2). 
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DISCUSSION 
(i) Wehaveto verify the validity of the defining relations 1-4. Clearly, Fx y(x, y) 


attains its maximum for x = y = 2, which is 1. Since also Fx y(x, y) > 0, the 
first property holds. Next, for any rectangle as in Figure 4.1, we have: 
16[Fx yA, Y) + Fx, y (ws, Y2) — Fy, y, Y2) — Fx y (xo, Y)] 
= MY + Y) + Lyx + Ya) — nyaa + Ya) — La Y (x2 + Ya) 


= 0 Y + OY + wy + mys — L Y — Y) — UY — xy? 





= Lf (Y2 — Y) + 23 (Ya — A) — Yi (2 — 1) + YS (2 — M1) 
= (23 — 27) — y) + (2 — (Ye — Yi) 
= (% + X2)%2 — A N(Y2 — Y) + (He — 1) (Yo2 + Y Ya — A) 


> 0, 


because 7 < y and xa < ya, so that the second property also holds. The 
third property holds because Fx y is continuous, and hence right-continuous. 
Finally, the fourth property holds because as either x + —oo or y > —oo (in 
fact, if either one of them is 0), then Fy y is 0, and if x > oo and y > oo (in 
fact, if x = y = 1), then Fx y is 1. 

(ii) For0 < v < 2and0 < y< 2, frr@w= E+ y) = 5%, 
y+ ry’) = pyr E yt ay) = 7 Qry+ y) = rt 2y) = ¿(2+ y) 
i.e., fx; y(x, y) = Ma + y), 0 <x < 2,0 < y < 2. For (x, y) outside the rect- 
angle [0, 2] x [0, 2], fx y is 0, since Fx y is constantly either 0 or 1. 


(iii) Since fx y is nonnegative, all we have to show is that it integrates to 1. In 


fact, 
00 00 2 2 1 
/ 1 Tt, Nady = | / swt Y dvdy 
—o00 Y —00 0 0 


1 2 p2 2 p2 
= sl / 1 xdxdy+ | f yaedy) 
8\ Jo Jo o Jo 


1 
= ¿0 x24+2x2) 
=1. 


(iv) Here, P(0<X <1, 1<Y<D= fí f, + yadxdy= ifi xdxdy + 
2 1 
af ayy = 1G x1 41% = 4. 


If the function fx y is given by: fx y(a, y) = cx?°y for 0 < x? < y < 1 (and 0 
otherwise): 


(i) Determine the constant c, so that fx y is a p.d.f. 


(ii) Calculate the probability: P(0 < X < 3,4 < Y < 1). 
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Figure 4.2 





Range of the Pair 
(x, y) 























DISCUSSION 


(i) Clearly, for the function to be nonnegative, c must be > 0. The actual value 
of c will be determined through the relationship: 


Tf cx’ ydady = 1. 
[((x,Y);0<x*<y<1) 


The region over which the p.d.f. is positive is the shaded region in Figure 4.2, 
determined by a branch of the parabola y = 2”, the y-axis, and the line segment 
connecting the points (0, 1) and (1, 1). Since for each fixed x with 0 < x < 1, y 
ranges from x? to 1, we have: Mir <y<n cx ydxdy = c faq z ydy)dx = 


Eh a. — ade = 28 — 1) = t= ladc= 2. 
(ii) Since y = 2? = 1 for x = 5, it follows that, for each x with 0 < x < 5 
the range of y is from i to 1; on the other hand, for each x with 5 <U< 3, the 
range of yis from x” to 1 (see Figure 4.3). 


Figure 4.3 





Diagram 
Facilitating 
Integration 
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3 1 
P(0<x<7]=Y< 1) =e f f #vasas+e f f ¿vayas 
1 


3 


1 3 1 
ah (ef yay) da+ cf a. yay) dx 
0 1 l a2 


4 2 


1 3 
cef? of 1 c [E 24 lt 
=: f e pets) x (1 — x) dx 


15c 38c 2,059c 41,311 


23:28 53x08 7x96 “2121 


_ 21 41311 _ 4131 _ 4131 _, 6 
=z * 21x25 ~~ 26 ~ 65536 °° 








1.1 Let X and Y ber.v.’s denoting, respectively, the number of cars and buses 
lined up at a stoplight at a given point in time, and suppose their joint 
p.d.f. is given by the following table: 


y\x 0 1 2 3 4 5 


0 0.025 0.050 0.125 0.150 0.100 0.050 
1 0.015 0.030 0.075 0.090 0.060 0.030 
2 0.010 0.020 0.050 0.060 0.040 0.020 


Calculate the following probabilities: 
(i) There are exactly 4 cars and no buses. 
(ii) There are exactly 5 cars. 
(iii) There is exactly 1 bus. 
(iv) There are at most 3 cars and at least 1 bus. 


1.2 In a sociological project, families with 0, 1, and 2 children are stud- 
ied. Suppose that the numbers of children occur with the following 
frequencies: 


0 children: 30%; Ichild: 40%; 2children: 30%. 


A family is chosen at random from the target population, and let X and 
Y be the r.v.'s denoting the number of children in the family and the 
number of boys among those children, respectively. Finally, suppose that 
P (observing a boy) = P(observing a girl) = 0.5. 

Calculate the joint p.d.f. fx, y(x, y) = P(X = x, Y = yọ), 0< y< x, x= 
0, 1, 2. 
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Hint: Tabulate the joint probabilities as indicated below by utilizing 
the formula: 


P(X =x, Y = y) = PU = y | X = DPX =2). 





y\x 0 1 2 


0 
1 
2 


1.3 If the r.v.’s X and Y have the joint p.d.f. given by: 
Fire, y =x+Yy O<x<l, O<y<l, 
calculate the probability P(X < Y). 
1.4 The rv.’s X and Y have the joint p.d.f. fy y given by: 


_ 6 2 £Y 
fia Y= 3 (e +3), O<x<l, 0O<y<2. 


(i) Show that fx y is, indeed, a p.d.f. 
(ii) Calculate the probability P(X > Y). 


1.5 The rv.’s X and Y have the joint p.d.f. fx y(x, y) =e "Y x > 0, y> 0. 
(i) Calculate the probability P(X < Y < c) for some c > 0. 
(ii) Find the numerical value in part (i) for c = log 2, where log is the 
natural logarithm. 


1.6 If the rv.’s X and Y have the joint p.d.f. fy y(x, y) = e 4, for x > 0 and 
y > 0, compute the following probabilities: 


O PA <v); Gi) PY sy); Gi) PX<Y); Gv) PA +Y <3). 


1.7 Let X and Y be r.v.’s jointly distributed with p.d.f. fx y(x, y) = 2/c?, for 
0O<x<y=<C. 
Determine the constant c. 


1.8 The rv.’s X and Y have the joint p.d.f. fx y given by: 
Fxr(x, y =cye Y, 0<y<xz. 
Determine the constant c. 
1.9 The joint p.d.f. of the r.v.'s X and Y is given by: 
ir, = xyr, O<, 0<y<cz. 


Determine the condition that cı and cz must satisfy so that fx y is, indeed, 
ap.d.f. 


1.10 The joint p.d.f. of the r.v.'s X and Y is given by: 
Fx y(x, y) =cx, x>0, y>0, l<x+y<2 (c> 0). 


Determine the constant c. 
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Hint: The following diagram should facilitate the calculations. 


The range of the pair (x, y) is the shadowed area. 























1.11 The r.v.'s X and Y have joint p.d.f. fx y given by: 
xr, y) = cy- xe, -y<a<y, 0<y<oo. 


Determine the constant c. 
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In the case of two r.v.’s with joint d.f. Fx y and joint p.d.f. fx y, we may define 
quantities which were not available in the case of a single r.v. These quantities 
are marginal d.f.'s and p.d.f.'s, conditional p.d.f.’s, and conditional expectations 
and variances. To this end, consider the joint d.f. Fy y(x, y) = P(X < x, Y < y), 
and let y > oo. Then we obtain Fx y(x, 00) = P(X < x, Y < œ) = P(X < 
x) = Fx(x); thus, Fx(x) = Fx y(x, œo), and likewise, Fy(y) = Fx y(oo, y). 
That is, the d.f.’s of the r.v.’s X and Y are obtained from their joint d.f. by 
eliminating one of the variables x or y through a limiting process. The d.f.'s 
Fx and Fy are referred to as marginal d.f.'s. If the r.v.'s X and Y are dis- 
crete with joint p.d.f. fx y, then P(X = x;) = P(X = t -œ < Y < œ) = 
dl Fx r (xi, Yj); Le, fx) = wen Jx,y(%i, yj), and likewise, fy(y;) = 
De cn Sx,r(xi, Yj). Because of this marginalization process, the p.d.f.’s of the 
r.v.’s. X and Y, fx and fy, are referred to as marginal p.d.f.'s. In the continuous 
case, fx and fy are obtained by integrating out the “superfluous” variables; 
Le, fx) = f° fxr Œ, ydyand fry) = f Sx,r(x, ydx. The marginal fx 
is, indeed, the p.d.f. of X because P(X < x) = P(X < x,—œ < Y < œ) = 
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SEn [S Srrts Ddt ds = f° US, fx r(s, Dat] ds =f". fx(s)ds; i.e., Fx(x) = 
P(X < x) = Es Jx(s)ds, so that 4 Fy(a) = fx(x), and likewise, Pr) = 
JSv(y) (for continuity points x and y of fx and fy, respectively). 

In terms of the joint and the marginal p.d.f.’s, one may define formally the 
functions: 


Sxyv@ly) = fxr, Y)/ fry) for fixed y with fy(y) > 0, 


and 


Syix(y|*) = fx y(x, y/fx(x) for fixed x with fx) > 0. 


These nonnegative functions are, actually, p.d.f.’s. For example, for the con- 
tinuous case: 


fry) _ 
fy” 


and similarly for fyjx(y| x); in the discrete case, integrals are replaced by 
summation signs. The p.d.f. fxir(- | y) is called the conditional p.d.f. of X, 
given Y = y, and fyjx( |x) is the conditional p.d.f. of Y, given X = x. The 
motivation for this terminology is as follows: For the discrete case, fxiy(x| y) = 
Se = Ope = P(X = x| Y = y); i.e., fxir(x| y) does, indeed, stand 
for the conditional probability that X = x, given that Y = y. Likewise for 
SyixC | x). In the continuous case, the points x and y are to be replaced by 
“small” intervals around them. 

The concepts introduced so far are now illustrated by means of Examples. 





00 1 00 
J. Sxir(x|y) dx = S. Sx y(x, y)dx = 








Refer to Example 1 and derive the marginal and conditional p.d.f.’s involved. 


DISCUSSION From Table 4.1, we have: fx(0)= 0.25, fx(1)= 0.53, fx(2)= 
0.18, and fx(8) = 0.04; also, fy(0) = 0.26, fy(1) = 0.54, fy(2) = 0.15, and 
Jy(8) = 0.05. Thus, the probability that there are 2 people in line one, for in- 
stance, regardless of how many people are in the other line, is: P(X = 2) = 
fx(2)=0.18. Next, fyıy(0|0)= = = > = 0.192, fyy( 10) = — = ae 
0.808, £xirQ10)=0, fxir(W310) = 0; SxyO| 1)= 020 = 20.37, farQ | D = 

ast = 55 = 0481, far) = gsi = 54 ~ 0-148, far(S1D= 0; £ar(012) = 








0, ¿Jar 12) = = É = é = 0.40, farQ12)= E = £ ~ 0.467, fxr(312) = 
015 = 15 = 0.133; IxrO13 = 0, far(113) = 0, fav 13) = %8 = 2 = 0.60, 
Sxir(313) = 22 = 2 = 0.40. Likewise for fyx(-|-). Thus, fy,x(0|0) = 0.2, 


Fy\x(1|0) = Pie kalit = = FGI = = 0; fyx0|1l) = # ~ 0.396, 
fyxd |) = 3 x 0.491, fax 1) = § ~ 0.113, frx(3 11) = 0; Jas |2) = 
0, fax(112) = = ee ~ 0.444, frix2|2) = & ~ 0.389, ee ig = 0.167; 
Syix(013)= fix 13) = 9, 11213) = 1116813) = 


Refer to Example 2 and derive the marginal d.f.'s and p.d.f.'s, as well as the 
conditional p.d.f.'s, involved. 


| EXAMPLE 8 
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DISCUSSION In (1), let y > oo to obtain Fx(w) = 1 — e~*"”, x > 0, and 
likewise Fy(y) = 1 — e~*24, y > 0, by letting x > oo. Next, by differentia- 
tion, fx(~) = 11e%, w > 0, and fy(y) = Azge*24, y > 0, so that the rv.’s X 
and Y have the Negative Exponential distribution with parameters à; and 42, 
respectively. Finally, for x > 0 and y > 0: 


AyAge Me 2Y 


Sxy(| y = get = Me. 


Svix(y| x”) = fry). 


r — fy(x), and likewise 


Refer to Example 4 and determine the marginal and conditional p.d.f.’s fx, fy, 
Jxjy, and fyx. 


DISCUSSION We have: 


1 i 21 
Sx(@) = f cx’ ydy = ca? | ydy = 54 =@"); O<ax <1, 
a? ax? 


we 2 v 2 2l o 
so=f Cx ydx=cy | x ra SY; 0<y<l, 
0 0 


and therefore 


Sxyy(%| y) zry a” 0O<x<yYy 0<y<l 
X|Y Do Se oe ed = > > 
SUSY YY 
Say 2y 


el, 0<x<l. 





Srix(y| x) = 22d — a) ==? 


Consider the function fx, y defined by: 
Fxr(x,y)=8xYy  0O<w=<y<l. 
Gi) Verify that fx y is, indeed, a p.d.f. 


(ii) Determine the marginal and conditional p.d.f.'s. 
(iii) Calculate the quantities: EX, EX?, Var(X), EY, EY?, Var(Y), and E(XY). 


DISCUSSION 
(i) Since fx y is nonnegative, all we have to check is that it integrates to 1. In 
fact, 
1 Y 1 Y 1 
/ / Saydady=8 | (f var) ay=4 | y dy=1. 
o Jo 0 0 0 
Gi) 


1 1 
fx) = J 8xy dy = su f ydy=4x(1-x%),  0O<x<l, 


y y 
fry) = / 8xydx = sy | ada = 4y’, 0<y<1l, 
0 0 
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and therefore 
8x 2x 
fral =-=, 0<w<y<l 
4y y 
8xy 2y 
= = i 0 < L 
Syix(y |) ma-r) 122 <H<y< 
(iii) 


1 1 1 1 
8 
Bx = | x-4 -adw =4 f 0 -adama | «asf zidi)= =, 
0 0 0 0 15 


1 1 1 
1 
EX? = / x”. dx(1 — x’ Jde = a( / ada — i was) = 3) sothat 
0 0 0 





1 1 
EY? = / y? -4ydy=4 | yYdy=3, so that 
0 
2 16 2 6 
= 2 = 2 = = = 
dd ae, 3 2 75 225 


Finally, 


1 py 1 y 
E(XY) =f / xy -8xydxdy = 8f e(f ae) dy 
o Jo 0 


8S fi 4 
= — dy = —. 
Sfo a> 4 


Once a conditional p.d.f. is at hand, an expectation can be defined as done in 
relations (1), (2), and (8) of Chapter 3. However, a modified notation will be 
needed to reveal the fact that the expectation is calculated with respect to a 
conditional p.d.f. The resulting expectation is the conditional expectation of 
one r.v., given the other r.v., as specified below. 


E(X|Y =y) = 2 far lg) or Ey=y=/ Xfx y (x| yd, 
LEN —oo 
(2) 


for the discrete and continuous case, respectively; similarly: 


BW |X =a) =D yfxl) or EYIX=3= f 


yen ~ 


00 


Y fyix(y| dy. 


(3) 


Of course, itis understood that the preceding expectations exist as explained 
right after relations (2) and (3) in Chapter 3 were defined. However, unlike 
the results in (1)-(3) in Chapter 3 which are numbers, in relations (2) and (3) 
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above the outcomes depend on y; or y, and x; or x, respectively, which reflect 
the values that the “conditioning” r.v.’s assume. For illustrative purposes, let 
us calculate some conditional expectations. 


In reference to Example 1, calculate: E(X | Y =0) and E(Y | X =2). 


DISCUSSION In Example 5, we have calculated the conditional p.d.f.'s 
SxiyC |0) and fyxC | 2). Therefore: 


5 21 21 
E(X|Y=0)=0x=+1x —42 = on d 
(X | ) x 36 T x 36 T x0+8x0 26 808, an 


8 7 3 31 
E(Y |X =2)= lx=+2x — =— 1722: 
(Y | )=0x0+ TE x g t3” 18 = 18 
So, if in the yline there are no customers waiting, the expected number of 
those waiting in the x-line will be about 0.81; likewise, if there are 2 customers 
waiting in the x-line, the expected number of those waiting in the y-line will 
be about 1.72. 


In reference to Example 2, calculate: E(X | Y = y) and E(Y | X = x). 

DISCUSSION In Example 6, we have found that fxyy(v| y) = fx(x) = 
Me” (x > 0), and fyix(y|x) = fr(y) = 428 Y (y > 0), so that: E(X | Y = 
Y = fo eme "dex = 1/11, and EY |X = x) = fy yàze™Y dy = 1/22, by 


integration by parts, or simply by utilizing known results. 


In reference to Example 4, calculate: E(X | Y = y) and E(Y | X = x). 





DISCUSSION In Example 7, we have found that fxir(x|Yy) = y, ar 0< 
x < yy, so that 
vI 3g? 3 vy 3 
Bay == | 0 a= | P= l. Meyer 
0 YSY YSY Jo 4 
Also, fyx(y|x) = LL, x? < y < 1, so that 
1 2y 2 1, 2(1 — x8) 
EY|X=x)= .— dy = 2? dy = ———_, 0 Í; 
CAGA Lo 1-14 af. Y= eae TS 


In reference to Example 8, calculate: E(X | Y = y) and E(Y | X = x). 


DISCUSSION In Jons 34D, we have found that fxiy(x|y) = e 0< 
x< y< l, and fyx(y|x) = Í 2 0<xw< y< l, so that 





Y 2 2 f! 2 
0 y Y” Jo 3 
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and 


2y 2 ei 2(1 — x°) 
BW |X =2)= | y- dy = sf vay= 0O<x<l. 
s” Lae 1-2 J, 3d — x)’ 


Now, for the discrete case, set g(y;) = E(X | Y = yj) and proceed to replace 
Yj by the r.v. Y. We obtain the r.v. g(Y) = E(X | Y), and then it makes sense to 
talk about its expectation Eg(Y) = E[E(X | Y)]. Although the F(X |Y = yy) 
depends on the particular values of Y, it turns out that its average does not, 
and, indeed, is the same as the EX. More precisely, it holds: 





E[E(X|Y)= EX and E[E(Y|X)]= EY. (4) 
That is, the expectation of the conditional expectation of X is equal to its 
expectation, and likewise for Y. Relation (4) is true both for the discrete and 
the continuous case. Its justification for the continuous case, for instance, is 


as follows: 
We have g(Y) = E(X | Y) and therefore 


Eg(Y) = f oD fray = J EX ID fray 


= / | Í faye ae Sr(ydy 


00 


-=f i / i [xfer wl wfrodaclay = f i | i xfx yŒ, yde dy 


CO 00 CO 
= J a| f Fx y(x, ay] dx = J xfy(xjdx = EX; i.e., 
—00 —00 —00 
Eg(Y) = BLE(X| Y)] = 

REMARK 1 However, Var [E(X | Y)] < Var(X) with equality holding, if and 
only if Y is a function of X (with probability 1). A proof of this fact may be 
found in Section 5.3.1 in the book A Course in Mathematical Statistics, 2nd 
edition (1997), Academic Press, by G. G. Roussas. 
Verify the first relation E[E(X | Y)] = EX, in (4) for Examples 4 and 8. 


DISCUSSION By Example 7, fx(x) = Fa?(1—«u%), 0 < x < 1, so that 


21/ fpf) . 1 21 
EX= [t Tea- aca] zdz- | war) = 5 


From Example 11, E(X |Y) = Lee 0 < Y < 1, whereas, from Example 7, 
fry) = Ey? Yy, 0 < y < 1, so that 


13 /y 21 21 f! 21 
Bex ly) = f Y ria = E] y dy = 35 = EX. 
0 0 
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(However, Var[ E(X | Y)] = Var (Y? ) = ¿Var(VT) = E[EY - (EVY)*] = 
B6- a) = 5120 < % = Var(Y).) 

Also, from Examples 12 and 8(ii), E(X | Y) = 2, 0< Y < 1, and fr(y) = 
4y?, 0 < y < 1, EX = $ by Example 8, so that 





12y 8 f! 8 
E[E(X al S ayay= 5] Y dy = — = EX. 
[E(X | ¥)] 3 34, 15 


(However, Var[E(X | Y)] = Var() = ¿Var(Y) < Var(Y).) 

In addition to the conditional expectation of X, given Y, one may define 
the conditional variance of X, given Y, by utilizing the conditional p.d.f. 
and formula (8) in Chapter 3; the notation to be used is Var(X | Y = yj) 
or Var(X | Y = y) for the discrete and continuous case, respectively. Thus: 


Var(X | Y =y) = domi - EX | Y = y)? fur(e | y), (5) 


XEN 


and 
Var(X | Y = y) =] [x — E(X | Y = Y)? for(e | yaa, (6) 


for the discrete and the continuous case, respectively. The conditional vari- 
ances depend on the values of the conditioning r.v., as was the case for the 
conditional expectations. From formulas (5) and (6), it is not hard to see (see 
also Exercise 2.20) that: 


Var(X | Y = y) = EGP |Y = y) - [EX |Y = y)P or 


(7) 
Var(X | Y = y = E(X’ | Y =y9-[E(X | Y = y)’, 


for the discrete and the continuous case, respectively. 
In reference to Example 8, determine Var(X | Y = y) by using the second 
formula in (7). 
DISCUSSION By (7%), 
Var(X | Y = Y) = E(X’ |Y = y) - [E(X | Y = y)? 
Y 2 2 
a fs n de A 3) (by Examples 8(ii) and 12) 
y 


dy y 
Zf ea z ag %=Y< 





2.1 Refer to Exercise 1.1 and calculate the marginal p.d.f.'s fx and fy. 


2.2 Refer to Exercise 1.2 and calculate the marginal p.d.f.’s fy and fy. 
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2.3 If the joint p.d.f. of the r.v.’s X and Y is given by the following table, 
determine the marginal p.d.f.’s fx and fy. 


y\x —4 —2 2 4 
—2 0 0.25 0 0 
-1 0 0 0 0.25 

1 0.25 0 0 0 
2 0 0 0.25 0 


2.4 The r.v.'s X and Y take on the values 1, 2, and 3, as indicated in the 
following table: 


y\x 1 2 3 
Į 2/36 2/36 3/36 


2 1/36 10/36 3/36 
3 4/36 5/36 6/36 


(i) Determine the marginal p.d.f.'s fx and fy. 
(ii) Determine the conditional p.d.f.'s fxrC-| y) and fyx( | 2). 


2.5 Ther.v.'s X and Y havejointp.d.f. fx y given by the entries ofthe following 
table: 
y\x 0 1 2 3 
1 1/8 1116 3/16 1/8 
2 1116 1/16 1/8 1/4 
(i) Determine the marginal p.d.f.'s fx and fy, and the conditional p.d.f. 


Sx | Y), y= 1, 2. 
(ii) Calculate: EX, EY, E(X|Y = y), y=1,2, and E[E(X | 1). 


(iii) Compare EX and E[E(X | Y]. 
(iv) Calculate: Var(X) and Var(Y). 


2.6 Let the r.v.'s X and Y have the joint p.d.f.: 


2 
Fx, y(x, y) = mn +1) y= 


Then compute: 
(i) The marginal p.d.f.’s fy and fy. 
(ii) The conditional p.d.f.’s fxyC | y) and fyyxC | 2). 
(iii) The conditional expectations E(X | Y = y) and E(Y | X = xv). 


Hint: Recall that: 77, t = "CH. 
2.7 In reference to Exercise 1.3, calculate the marginal p.d.f.'s fx and fy. 
2.8 Determine the marginal p.d.f.'s of the r.v.’s X and Y whose joint p.d.f. is 
given by: 
6 
fir D= ræ)  0<w<l 0<y<l 
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2.9 Let X and Y be two r.v.'s with joint p.d.f. given by: 
Fira yY) = ye,  0<ys<zxw<oo. 


G) Determine the marginal p.d.f.'s fx and fy, and specify the range of 
the arguments involved. 
(ii) Determine the conditional p.d.f’s fxy(- | y) and fy x | x), and spec- 
ify the range of the arguments involved. 
(iii) Calculate the (conditional) probability P(X > 2log2| Y= log2), 
where always log stands for the natural logarithm. 


2.10 The joint p.d.f. of the r.v.’s X and Y is given by: 
Fira, y=xe OY, x>0, y>0. 


(i) Determine the marginal p.d.f.'s fx and fy. 
(ii) Determine the conditional p.d.f. fyxC | x). 
(iii) Calculate the probability P(X > log 4), where always log stands for 
the natural logarithm. 


2.11 The joint p.d.f. of the r.v.’s X and Y is given by: 
Lo 
Sxx@ y) = ¿ye a 0O<x<o, 0<y<2. 


(i) Determine the marginal p.d.f. fy. 
Gi) Find the conditional p.d.f. fyıyC | y), and evaluate it at y = 1/2. 
(iii) Compute the conditional expectation E(X | Y = y), and evaluate it 
at y = 1/2. 


2.12 In reference to Exercise 1.4, calculate: 
(i) The marginal p.d.f.’s fx, fy, and the conditional p.d.f. fyx( |x); in 
all cases, specify the range of the variables involved. 
(ii) EY and E(Y |X =~). 
(iii) E[E'(Y | X)] and observe that it is equal to EY. 
(iv) The probability P(Y > |X < 4). 


2.13 In reference to Exercise 1.7, calculate: 
(i) The marginal p.d.f.’s fy and fy. 
Gi) The conditional p.d.f.’s fxyC | y) and fyxC | x). 
(iii) The probability P(X < 1). 


2.14 In reference to Exercise 1.8, determine the marginal p.d.f. fy and the 
conditional p.d.f. fxjyG | y). 


2.15 In reference to Exercise 1.9: 
(i) Determine the marginal p.d.f.'s fx and fy. 
(ii) Determine the conditional p.d.f. fxjyC | Y). 
(iii) Calculate the EX and E(X | Y = y). 
(iv) Show that E[E(X| Y)] = EX. 
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2.16 In reference to Exercise 1.10, determine: 
(i) The marginal p.d.f. fx. 
Gi) The conditional p.d.f. fyxC | 2). 


2.17 In reference to Exercise 1.11, determine: 
(i) The marginal p.d.f. fy. 
(ii) The conditional p.d.f. fxy( | y). 
(iii) The marginal p.d.f. fx. 


2.18 (i) For a fixed y > 0, consider the function f(x, y) = eve, c= 
0, 1, ... and show that it is the conditional p.d.f. of a r.v., given that 
another r.v. Y = y. 
(11) Now, suppose that the marginal p.d.f. of Y is Negative Exponential 
with parameter à = 1. Determine the joint p.d.f. of the r.v.'s X and Y. 
(iii) Show that the marginal p.d.f. fx is given by: 


1 x+l 
face) = (5) , w= l 


2.19 Suppose the r.v. Y is distributed as P(A) and that the conditional p.d.f. of 
a r.v. X, given Y = y, is B(y, p). Then show that: 
(i) The marginal p.d.f. fy is Poisson with parameter àp. 
Gi) The conditional p.d.f. fyjx( |x) is Poisson with parameter Aq (with 
q = 1 — p) over the set: x, x + 1,.... 


2.20 (i) Let X and Y be two discrete r.v.'s with joint p.d.f. fx y. Then show 
that the conditional variance of X, given Y, satisfies the following 
relation: 


Var(X | Y = yj) = E(X?’ | Y = yj) — [E(X | Y = y). 


(ii) Establish the same relation, if the r.v.’s X and Y are of the continuous 
type. 


4.3 Expectation of a Function of Two r.v.'s, Joint and Marginal m.g.f.’s, Covariance, 





i and Correlation Coefficient 


In this section, a function of the r.v.'s X and Y is considered and its expectation 
and variance are defined. As a special case, one obtains the joint m.g.f. of X 
and Y, the covariance of X and Y, and their correlation coefficient. To this end, 
let g be a real-valued function defined on %?, so that g(X, Y) is a r.v. Then the 
expectation of g(X, Y) is defined as in (6) in Chapter 3 except that the joint 
p.d.f. of X and Y is to be used. Thus: 


EX, Y= Y 9%, y)Sxr(%, yj) or / f gŒ, Y) Sx y€, Ydxdy, 


LER, y¡€R 


(8) 
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for the discrete and the continuous case, respectively, provided, of course, the 
quantities defined exist. Properties analogous to those in (4) in Chapter 3 apply 
here, too. Namely, for c and d constants: 


Elcg(X, Y)] = cEg(X, Y), Elcg(X, Y) + d] =cEg(X, Y)+d. (9) 
Also, if h is another real-valued function, then (see also Exercise 3.17): 
aX Y) <h(X, Y) implies Eg(X, Y) < ERX, Y), (10) 
and, in particular, 
g(X) < MX) implies Eg(X) < Eh(X). an 


For the special choice of the function g(x, y) = e""t®4, t, ta reals, the 
expectation E exp(t X + t2Y) defines a function in t, t2 for those tı, ta for 
which this expectation is finite. That is: 


My y(t, t2) = Eee, (4, t2) € CEN. (12) 
Thus, for the discrete and the continuous case, we have, respectively, 
Mxr(t,to)= Y ete fy y (aa, yj), (13) 
LER, y¡Eh 
and 
CO CO 
Mira ty = f f et fxr, dedy, (14) 
—00 J —00 


The function Mx y(-, -) so defined is called the joint m.g.f. of the r.v.’s X and 
Y. Clearly, Mx y(0, 0) = 1 for any X and Y, and it may happen that C = {(0, 0)} 
or C C R? or C = R. Here are two examples of joint m.g.f.'s. 


Refer to Example 1 and calculate the joint m.g.f. of the r.v.'s involved. 


DISCUSSION For any t;, ta € R, we have, by means of (13): 


33 
Mx y(t, ta) = YY Y fy y, y) 


x=0 y=0 
= 0.05 + 0.20e*? + 0.21e4 + 0.26e*:+ + 0.06e1+22 + 0.0880 +0 
+0.07674+22 4 0,030%% +3 4 0.02e84+22 4 9,020%+32 (15) 


Refer to Example 2 and calculate the joint m.g.f. of the r.v.'s involved. 
DISCUSSION By means of (14), we have here: 


00 CO 
Mx y(t, t2) = / / etry) y ge 228 do dy 
o Jo 


00 oo 
= f Ae 1 ga. f Age 42212 qy, 
0 0 
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But fg Ae 470% da = — Ae WP = AL, provided ti < A1, and like- 


wise [y 20020 dy = > for tz < A». (We arrive at the same results 
without integration by recalling (Example 6) that the r.v.’s X and Y have the 


Negative Exponential distributions with parameters A; and A2, respectively.) 
Thus, 











A1 Aa 
x ? 
Ab Ag—te 





Mx y(t, t2) = t < ài tg <Azg. (16) 


In (12), by setting successively t2 = 0 and tı = 0, we obtain: 
Mx, y(t, 0) = Ee*% = Mx(t), Mx,y(O, t2) = Ee?’ = My(t2). (17) 


Thus, the m.g.f.’s of the individual r.v.'s X and Y are taken as marginals from 
the joint m.g.f. of X and Y, and they are referred to as marginal m.g.f.'s. For 
example, in reference to (15) and (16), we obtain: 


Mx(t) = 0.25 + 0.53¢4 + 0.18 e24 + 0.04e4, t eù, (18) 
My(tz) = 0.26 + 0.54 e2 + 0.15 622 + 0.05632, tt EN, (19) 


and 


A 
Mx(t) = ss t2 < hz. (20) 


Ag 
t A My(to) = —*— 
pu < Ài, y(t2) > 


to” 


The joint m.g.f., as defined in (12), has properties analogous to the ones 
stated in (12) of Chapter 3. Namely, for c1, cz and d;, d2 constants: 


Mo, x+dyeo¥4dy(t1, t2) = ev My y(c1t1, Cata). (21) 


Its simple justification is left as an exercise (see Exercise 3.2). 
In the present context, a version of the properties stated in (13) of Chapter 
3, is the following: 


0 0 
— Myx y (tr, t2)ln=t=0 = EX, — My, y(h, ta) =1.=0 = EY, (22) 
ot Ola 
and 
a? 
ie Ain = E(XY), 23 
7090 x,y (1, l2)l1=0=0 (XY) (23) 


provided one may interchange the order of differentiating and taking expec- 
tations. For example, for (23), we have: 
2 


0 
——_. ti, to) |, =10= 
TE x,y (ti, t2)|q=t=0 


E ti X+toY 





ti=t2=0 


~ dt ote 


32 
=E gee | 
dt dts t¡=t2=0 


= Bare | sg) = E(XY). 
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REMARK 2 Although properties (21) and (22) allow us to obtain moments 
by means of the m.g.f.'s of the r.v.'s X and Y, the most significant property of 
the m.g.f. is that it allows (under certain conditions) to retrieve the distribution 
of the r.v.’s X and Y. This is done through the so-called inversion formula. 

Now, select the function g as follows: g(x, y) = cx + dy, where c and d are 
constants. Then, for the continuous case: 


Eg(X, Y) = E(cX + dY) = / / (cx + dy) fx, y(x, ydx dy 
= cf / xfx yŒ, yda dy + af / Y fx y(x, ydx dy 


ef ef Sxv(@, aya af” OM Sx,r(x, was Jay 


CO 00 
= cf Ufx(a)dx + af yfy(a)jdy = cEX + dEY, i.e., 
00 —00 


assuming the expectations involved exist: 
E(cCX+dY) =cEX + dEY, where c and d are constants. (24) 


In the discrete case, integrals are replaced by summation signs. On account 
of the usual properties of integrals and summations, property (24) applies to 
a more general situation. Thus, for two functions gı and g2, we have: 


Elgi(X, Y) + g2(X, Y)] = Eg (X, Y) + Ega(A, Y), (25) 


provided the expectations involved exist. 

Next, suppose the r.v.'s X and Y have finite expectations and take g(x, y) = 
(x— EX)(y — EY). Then the Eg(X, Y) = E[((X — EX)(Y — EY)] is called the 
covariance of the r.v.’s X and Y and is denoted by Cov(X, Y). Thus: 


Cov(X, Y) = El(X — EXXY — EY)] = E(XY)- (EXIME). (26) 


The second equality in (26) follows by multiplying out (X — E.X)(Y — EY) and 
applying property (25). 

The variance of a single r.v. has been looked upon as a measure of dispersion 
of the distribution of the r.v. Some motivation will be given subsequently to the 
effect that the Cov(X, Y) may be thought of as a measure of the degree to which 
X and Y tend to increase or decrease simultaneously when Cov(X, Y) > 0 orto 
move toward opposite directions when Cov(X, Y) < 0. This pointis sufficiently 
made by the following simple example. 


Consider the events A and B with P(A)P(B) > Oand set X = I4 and Y = Ip for 
the indicator functions, where [4(s) = lifs € Aand [1(s) = Oifs e A". Then, 
clearly, EX = P(A), EY = P(B), and XY = lang, so that E(XY) = P(AN B). 
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It follows that Cov(X, Y) = P(AN B) — P(A)P(B). Next, 
P(A)[P(Y =1| X=1)— P(Y =1)] = P(AN B)— P(A)P(B) 


= Cov(X, Y), (27) 
P(ADILP(Y =0/X=0)- P(Y=0)] = P(4 N B®) — P(ADP(BS) 
= P(AN B) — P(A)P(B) = Cov(X, Y), (28) 


P(ANIP(Y =1/X=0)- P(Y = D]= P(4 N B) — P(A) P(B) 
= —[P(AN B) — P(A)P(B)] = —Cov(X, Y), (29) 





P(AIPY = 0|X = 1) — PŒ = 0] = P(ANB’) — P(A)P(B’) 
= —[P(AN B) — P(A)P(B)] = —Cov(X, Y), (80) 


(see also Exercise 3.3). 


From (27) and (28), it follows that Cov(X, Y) > 0 if and only if P(Y = 1| X = 
1) > P(Y = 1), or P(Y = 0| X = 0) > P(Y = 0). That is, Cov(X, Y) > 0 if and 
only if, given that X has taken a “large” value (namely, 1), itis more likely that Y 
does so as well than it otherwise would; also, given that X has taken a “small” 
value (namely, 0), it is more likely that Y does so too than it otherwise would. 
On the other hand, from relations (29) and (30), we see that Cov(X, Y) < 0 if 
and only if P(Y =1|X = 0) > P(Y = 1), or P(Y = 0| X = 1) > P(Y = 0). 
That is, Cov(X, Y) < 0 if and only if, given that X has taken a “small” value, it 
is more likely for Y to take a “large” value than it otherwise would, and given 
that X has taken a “large” value, it is more likely for Y to take a “small” value 
than it otherwise would. 


As a further illustration of the significance of the covariance we proceed to 
calculate the Cov(X, Y) for the r.v.'s of Example 1. 


Refer to Example 1 and calculate the Cov(X, Y). 


DISCUSSION In Example 5, the (marginal) p.d.f.’s fx and fy were calcu- 
lated. Then: EX = 1.01 and EY = 0.99. Next, the r.v. XY is distributed as 
follows, on the basis of Table 4.1. 


ay | 0 1 2 8 4 6 9 
fxy | 0.46 0.26 0.14 0 0.07 0.05 0.02 





Therefore E(XY) = 1.3 and then, by formula (26), Cov(X, Y) = 1.3— 1.01 x 
0.99 = 0.3001. 

Here the covariance is positive, and by comparing the values of the con- 
ditional probabilities in Example 5 with the appropriate unconditional prob- 
abilities, we see that this is consonant with the observation just made that X 
and Y tend to take simultaneously either “large” values or “small” values. (See 
also Example 19 later.) 


THEOREM 1 
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The result obtained next provides the range of values of the covariance of 
two r.v.'s; it is also referred to as a version of the Cauchy-Schwarz inequality. 





(i) Consider the r.v.'s X and Y with EX = EY = 0 and Var(X) = 

Var(Y) = 1. Then always —1 < E(XY) < 1, and E(XY) = 1 if and 

only if P(X = Y) = 1, and E(XY) = —1 if and only if PX = —Y) = 1. 

(ii) For any r.v.'s X and Y with finite expectations and positive variances 
2 and of, it always holds: 


—oxoy < Cov(X, Y) < cxoy, (31) 


and Cov(X, Y) = oxoy if and only if P[Y = EY + &(X — EX)] = 1, 
Cov(X, Y) = —oxoy if and only if P[Y = EY — 2(X- EX)]= 1. 








PROOF 


(i) Clearly, 0 < E(X — Y}? = EX? + EY? — 2E(XY) = 2 — 2E(XY), so that 
E(XY) < 1; also, 0 < E(X + Y}? = EX? + EY? + 2E(XY) = 2 + 2E(XY), 
so that —1 < E(XY). Combining these results, we obtain —1 < E(XY) < 1. 
As for equalities, observe that, if P(X = Y) = 1, then E(XY) = EX? = 1, 
and if P(X = —Y) = 1, then E(XY) = —EX? = —1. Next, E(XY) = 1 implies 
E(X—Y Y = 0 or Var(X —Y) = 0. But then P(X—Y = 0) = lorP(X = Y) = 1 
(see Exercise 2.4 in Chapter 3). Also, E(XY) = —1 implies E(X + Y)? = 0 or 
Var(X + Y) = 0, so that P(X = —Y) = 1 (by the exercise just cited). 

(ii) Replace the r.v.'s X and Y by the rv.’s X* = ZEX and Y* = m, for 
which EX* = EY* = 0 and Var(X*) = Var(Y*) = T Then the inequalities 
—1 < E(X*Y*) < 1 become 


-1 se| (Z5) EAS <1 (32) 
Ox OY 


from which (31) follows. Also, E(X*Y*) = 1 if and only if P(X* = Y*) = 
1 becomes E[(X — EX)(Y — EY)] = oxoy if and only if P[Y = EY + a 
(X — EX)] = 1, and E(X*Y*) = —1 if and only if P(X* = —Y*) = 1 becomes 
E[((X — EX)(Y — EY)] = —oxoy if and only if P[Y = EY — (k= EX)) = 1. 
A restatement of the last two conclusions is: Cov(X, Y) = oxoy if and only 
if P[Y = EY + Z (x = EX)] = 1, and Cov(X, Y) = —oxoy if and only if 
P[Y = EY — a (x EX)=1. A 





From the definition of the Cov(X, Y) in (26), it follows that if X is measured 
in units, call them a, and Y is measured in units, call them b, then Cov(X, Y) is 
measured in units ab. Furthermore, because the variance of a r.v. ranges from 
0 to oo, it follows from (31) that Cov(X, Y) may vary from —oo to oo. These two 
characteristics of a covariance are rather undesirable and are both eliminated 
through the standardization process of replacing X and Y by == ACES and E =. E 
By (32), the range of the covariance of these standardized r.v. si is he interval 
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[—1, 1]. This covariance is called the correlation coefficient of the r.v.’s X and 
Y and is denoted by p(X, Y). Thus: 


an (ENE)! 


_Cov(X, Y) _ E(XY)- (EXIEY) 








(83) 
OXOY OxOy 
Furthermore, by (82): 
—1 < p(X, Y) < 1, (34) 
and, by part (ii) of Theorem 1: 
p(X, Y) = 1 ifand only if Ply Hye x EX)| =i; (35) 
Ox 
p(X, Y) = —1 if and only if Ply = EY- Z (X - EX)] =1. (36) 
ox 


The straight lines represented by y = EY + zw — EX) and y = EY — r 
(x — EX) are depicted in Figure 4.4. 


Figure 4.4 


Lines of Perfect 
Linear Relation of x 
and y 




















From relation (35), we have that p(X, Y) = lifand onlyif(X, Y) are linearly 
related (with probability 1). On the other hand, from Example 17, we have that 
Cov(X, Y) > 0 if and only if X and Y tend to take simultaneously either “large” 
values or “small” values. Since Cov(X, Y) and p(X, Y) have the same sign, 
the same statement can be made about p(X, Y), being positive if and only if 
X and Y tend to take simultaneously either “large” values or “small” values. 
The same arguments apply for the case that Cov(X, Y) < 0 (equivalently, 
p(X, Y) < 0). This reasoning indicates that p(X, Y) may be looked upon as a 
measure of linear dependence between X and Y. The pair (X, Y) lies on the 
line y= EY + (a — EX) if p(X, Y) = 1; pairs identical to (X, Y) tend to be 
arranged along this line, if (0 <)o(X, Y) < 1, and they tend to move further 
and further away from this line as p(X, Y) gets closer to 0; the pairs bear no 
sign of linear tendency whatever, if p(X, Y) = 0. Rough arguments also hold 
for the reverse assertions. For 0 < p(X, Y) < 1, the r.v.’s X and Y are said to 
be positively correlated, and uncorrelated if p(X, Y) = 0. Likewise, the pair 
(X, Y) lies on the line y = EY — ae — EX) if p(X, Y) = —1; pairs identical 
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to (X, Y) tend to be arranged along this line if —1 < p(X, Y) < 0. Again, rough 
arguments can also be made for the reverse assertions. For —1 < p(X, Y) < 0, 
the r.v.’s X and Y are said to be negatively correlated. 

Actually, a more precise argument to this effect can be made by consid- 
ering the distance D of the (random) point (X, Y) from the lines y = EY + 
= (x — EX). It can be seen that: 





2,2 
2007 





ED? = 
a2 +a? 
x F. 


a -= |p, Y))). (37) 


Then one may use the interpretation of the expectation as an average and 
exploit (37) in order to arrive at the same reasoning but in a more rigorous 
way. 


Figure 4.5 
































As an illustration, let us calculate the p(X, Y) for Examples 1 and 8. 


L EXAMPLE 19 In reference to Example 1, calculate the Cov(X, Y) and the p(X, Y). 


DISCUSSION From Table 4.1, we find EX? = 1.61, EY? = 1.59. By 
Example 18, EX=1.01, EY =0.99, so that Var(X) = EX? — (EX)? = 0.5899, 
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Var(Y) = EY? — (EY Y = 0.6099. Since Cov(X, Y) = 0.3001 (by Example 18), 


. = Cov(x,Y) = 0.3001 A 
we have: o(X, Y) = y Vara Vary) /0.5899 x 0.6099 0.5. 








In reference to Example 8, calculate the Cov(X, Y) and the p(X, Y). 


DISCUSSION By Example 8(ii), Cov(X, Y) = E(XY)— (EXIEY) = 5 = 
E - : z= 50 y Var(X) = qa. y Var(Y) = LA so that 
Cov(X, Y) — 


4 
p(X, Y) = = = ~ 0.492. 
vVVar(X)y/ Var) 1.4 66 








Let X and Y be two r.v.'s with finite expectations and equal (finite) variances, 
and set U = X + Y and V = X — Y. Then the r.v.'s U and V are uncorrelated. 


DISCUSSION Indeed, 
E(UV) = E(X + Y)X - Y)] = EQ? — Y?) = EX? — EY’, 
(EUNEV) = [E(X + V)][E(X — Y)] 
= (EX + EYXEX — EY) = (EX) — (EY Y, 
so that 
Cov(U, V) = E(UV) — (EU)(EV) = [EX? — (EXY] — LEY? — (EYY] 
= Var(X) — Var(Y) = 0. 


Figure 4.5 illustrates the behavior of the correlation coefficient p(X, Y) of the 
r.v's X and Y. In (a), p(X, Y) = 1, the rv.’s X and Y are perfectly positively 
linearly related. In b), p(X, Y) = —1, the r.v.’s X and Y are perfectly negatively 
linearly related. In (c), 0 < p(X, Y) < 1, the r.v.'s X and Y are positively corre- 
lated. In (d), —1 < p(X, Y) < 0, the r.v.'s X and Y are negatively correlated. In 
(e), p(X, Y) = 0, the r.v.'s X and Y are uncorrelated. 

The following result presents an interesting property of the correlation 
coefficient. 





Let X and Y be r.v.'s with finite first and second moments and positive 
variances, and let c1, c2, d1, da be constants with c,cz Æ 0. Then: 


p(o.X +d), ca Y +d2)= + p(X, Y), with + if cjc2 > 0 and — if cyc2 < 0. 
(38) 











PROOF Indeed, Var(c,X + dı) = c$ Var(X), Var(c2¥ + dz) = c3Var(Y), and 
CovíciX + d¡,caY + d2) = Ell(ciX + di) — El(c¡X + d1)][(c2Y + da) — 
E(c2Y + d2)1) = Elci(X — EX) -ca(Y — EY)] = cicaEl(X — EX) — EY)] = 


cicaCov(X, Y). Therefore p(c,X + di, caY + da) = TA sata and the 
conclusion follows. A 
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L EXAMPLE 22 Let X and Y be temperatures in two localities measured in the Celsius scale, 


and let U and V be the same temperatures measured in the Fahrenheit scale. 
Then p(X, Y) = p(U, V), as it should be. This is so because U = 2X + 32 and 
V = 2Y + 32, so that (38) applies with the + sign. 


This section is concluded with the following result and an example. 





THEOREM 3 
For twor.v.’s X and Y with finite first and second moments, and (positive) 
standard deviations ox and oy, it holds: 


Var(X + Y) = 02 + 02 + 2Cov(X, Y) = of + 02 +20x0yp(X, Y), (89) 
and 


Var(X + Y) =0%+0% if X and Y are uncorrelated. (40) 








PROOF Since (40) follows immediately from (39), and Cov(X, Y) = oxoy x 

p(X, Y), it suffices to establish only the first equality in (39). Indeed, 
Var(X + Y) = E[((X + Y) - E(X + Y) = E[(X — EX) + EQ — EN) 

E(X = EX? + E(Y — EY? + 2E[(X — 2X = EN] 

o% +0% +2C%(X, Y). A 


L EXAMPLE 23 __| In reference to Examples 1 and 8 and by means of results obtained in Examples 
19, 8Giii), and 20, respectively, calculate Var(X + Y). 


DISCUSSION By (39), 


Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y) 


= 0.5899 + 0.6099 + 2 x 0.3001 = 1.8 for Example 1, and 
11 2 


4 1 
==+> — =- E le 8. 
305 + zT 2259 for Example 8 


3.1 Let X and Y be the r.v.'s denoting the number of sixes when two fair dice 
are rolled independently 15 times each. Determine the E(X + Y). 


3.2 Show that the joint m.g.f. of two r.v.’s X and Y satisfies the following 
property, where c1, C2, dı, and dz are constants. 


dti+data 
Mo x+a, cor +02 (th, t2) = eT?” My y(c1t1, Cato). 
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3.3 Provide a justification of relations (28)—(30). That is: 
(i) P(A N B°) — P(A°)P(B°) = P(AN B) — P(A)P(B). 
(ii) PAN B) — P(A°)P(B) = —P(AN B) + P(A)P(B). 
Gi) PAN BS — P(A)P(B‘) = —P(AN B) + P(A)P(B). 


3.4 Let X and Y be two r.v.'s with EX = EY = 0. Then, if Var(X — Y) = 0, it 
follows that P(X = Y) = 1, and if Var(X + Y) = 0, then P(X = —Y) = 1. 


Hint: Use Exercise 2.4 in Chapter 3. 


3.5 In reference to Exercise 2.1 (see also Exercise 1.1), calculate: 
(i) EX, EY, Var(X), and Var(Y). 
(ii) Cov(X, Y) and p(X, Y). 
(iii) Decide on the kind of correlation of the r.v.'s X and Y. 


3.6 Refer to Exercises 1.2 and 2.2 and calculate: 
(i) EX, EY, Var(X), Var(Y). 
(ii) E(XY), Cov(X, Y). 
(iii) p(X, Y). 
(iv) What kind of correlation, if any, do the r.v.'s X and Y exhibit? 


3.7 In reference to Exercise 2.3: 
(i) Calculate EX, EY, Var(X), and Var(Y). 
(ii) Calculate Cov(X, Y) and p(X, Y). 
(iii) Plot the points (-4, 1), (22, 2), (2, 2), and (4, — 1), and reconcile this 
graph with the value of p(X, Y) found in part (ii). 


3.8 In reference to Exercise 2.4, calculate the following quantities: 
G) EX, EY, Var(X), and Var(Y). 
(ii) Cov(X, Y) and p(x, Y). 
3.9 Refer to Exercise 2.5, and calculate the Cov(X, Y) and the p(X, Y). 


3.10 Let X be ar.v. taking on the values —2, —1, 1, 2, each with probability 1/4, 
and define ther.v. Y by: Y = X?. Then calculate the quantities: EX, Var(X), 
EY, Var(Y), E(XY), Cov(X, Y), and p(X, Y). Are you surprised by the 
value of p(X, Y)? Explain. 


3.11 Refer to Example 8 and compute the covariance Cov(X, Y) and the cor- 
relation coefficient p(X, Y). Decide on the kind of correlation of the r.v.'s 
X and Y. 


3.12 In reference to Exercise 2.7 (see also Exercise 1.3), calculate: 
(i) The expectations EX and EY. 
(ii) The variances Var(X) and Var(Y). 
(iii) The covariance Cov(X, Y) and the correlation coefficient p(X, Y). 
(iv) On the basis of part (iii), decide on the kind of correlation of the r.v.'s 
X and Y. 


3.13 In reference to Exercise 2.8, calculate: 
(i) The expectations EX and EY. 
(ii) The variances Var(X) and Var(Y). 
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(iii) The covariance Cov(X, Y) and the correlation coefficient p(X, Y). 
(iv) On the basis of part (iii), decide on the kind of correlation of the r.v.'s 
X and Y. 


3.14 Let X be a r.v. with finite expectation and finite and positive variance, 
and set Y = aX + b, where a and b are constants and a 4 0. Then show 
that |p(X, Y)| = 1 and, indeed p(X, Y) = 1 if and only if a > 0, and 
p(X, Y) = —l if and only ifa < 0. 


3.15 For any two r.v.'s X and Y, set U = X + Y and V = X — Y. Then show 
that: 
© P(UV < 0) = P(X < |Y). 
Gi) If EX? = EY? < œ, then E(UV) = 0. 
Gii) If EX? < oo, EY? < œ and Var(X) = Var(Y), then the r.v.’s U and 
V are uncorrelated. 


3.16 Let X and Y be r.v.'s with finite second moments EX?, EY?, and Var(X) > 
0. Suppose we know X and we wish to predict Yin terms of X through the 
linear relationship Y = aX + $, where « and £ are (unknown) constants. 
Further, suppose there exist values 4 and Ê of a and £, respectively, 
for which the expectation of the square difference [Y — (4X + )]? is 
minimum. Then Y = âX + $ is called the best linear predictor of Y 
in terms of X (when the criterion of optimality is that of minimizing 
E[Y — (aX + B)]? over all a and $). Then show that â and $ are given as 
follows: 


=p),  P=EY-GEX, 
ox 
where ox and oy are the s.d.'s of the r.v.’s X and Y, respectively. 


3.17 Justify the statement made in relation (10), for both the discrete and the 
continuous case. 


| 4.4 Some Generalizations to K Random Variables 


If instead of two r.v.'s X and Y we have k r.vs Xj, ..., Xy, most of the con- 
cepts defined and results obtained in the previous sections are carried over 
to the k-dimensional case in a straightforward way. Thus, the joint proba- 
bility distribution of (Xy, ..., Xy), to be denoted by Py, .. x,, is defined by: 
Py, x, (B) = P[(%, ..., Xx) € B], BC RE = R x- - -x R (k factors), and their 
joint af. is: Fy, a x, (1, E Xk) = P(X, SS e A Xk < Xx), Wires Xp E BR. 
The obvious versions of properties #1 and #3 stated in Section 4.1 hold here 
too; also, a suitable version of property +2 holds, but we shall not insist on 
it. The joint p.d.f. of X1, ..., Xy is denoted by fx, .. x, and is defined in an 
obvious manner. Thus, for the case the r.v.’s X1, ..., X are discrete taking 
on respective values X;,...,%i, we have fx... x, (Wi, --., Mi) = PX = 
Ui, ..., Xk = Xi) and 0 otherwise. Then, for B c RË, P[(X,,..., Xx) € B] = 
Y Sx,,....X,(41, ---, %), where the summation extends over all (%, ..., %) € B. 
For the continuous case, the joint p.d.f. is a nonnegative function such that 
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pa 


. . k 
2-dimensional case, ¿27 


tinuity points (æ, ..., Lk) of fx... 
examples will be presented, one for the discrete case, and two for the contin- 
uous case. In the present k-dimensional case, there are many marginal d.f.'s 
and p.d.f.’s. Thus, if in Fx, .. x,(%1, ..., Xg), t ofthe X'S, x;,,..., Zj, are replaced 
by +00 (in the sense they are let to tend to +00), then what is left is the 
marginal joint d.f. of the rv.’s Xi, ..., Xis Fx, PA X,» Where s + t = k. Like- 
wise, if in fxi.. x,(%1, -.., Lk) Lis ---, Vh are eliminated through summation 
(for the discrete case) or integration (for the continuous case), what is left 
is the marginal joint p.d.f. of the r.v.’s X;,,..., Xio Fx; ---> Xi, Combining 
joint and marginal joint p.d.f.'s, as in the 2- dimensional case, we obtain a variety 
of conditional p.d.f.'s. Thus, for example, 


FX... Xy (91) «++ Ep) 
Sxi ze Xis (%,; ERP Xi.) 





Fs. Kg (Zo ys Kig pus ++ +9 Ch | Wir -oa OH) = 


Utilizing conditional p.d.f.'s, we can define conditional expectations and con- 
ditional variances, as in the 2-dimensional case (see relations (2), (3) and (5), 
(6)). For a (real-valued) function g defined on i“, the expectation of the rv. 
g(X1,..., Xx) is defined in a way analogous to that in (8) for the 2-dimensional 
case, and the validity of properties (9) and (10) is immediate. In particular, 
provided the expectations involved exist: 


EC Xy +--+ + CkXk +d) = EX, +--+ + EX, +d, 
Ci, ..., Ck, d constants. (41) 


By choosing g(%, ..., Xx) = €Xp(1%1 +-:-:+1X%), tı, ..., tg € R, the resulting 
expectation (assuming it is finite) is the joint m.g.f. of X1, ..., Xp; 1.e., 


My, xt «+, &) = Eettit tome, is h) EC RE. (42) 


The appropriate versions of properties (21) and (23) become here: 


Mitra ea y AAA xiti, td, (43) 
where C¡,..., Ck and d;,..., dy are constants, and: 
gute + n a 
amg pup A” S x, (hy, >>, bli = 4 =0 = E(X X) (Um) 
for > 0 integers m, ..., x. 


REMARK 3 Relation (44) demonstrates the joint moment generating prop- 
erty of the joint m.g.f. The joint m.g.f. can also be used for recovering the joint 
distribution of the r.v.’s X;,..., Xx as indicated in Remark 2. 
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Finally, the appropriate versions of relations (39) and (40) become here, 
by setting of, = Var(X;), ¿=1,..., k: 


k 
Var(X +--+ Xi) = > 0 +2 Y CX, X;) 


i=1 1<i<j<k 
k 

=) 0h +2 Y oxox X), (45) 
i=1 1<i<j<k 


and 


k 
Var(X, +--+ Xx) = y 5%, if the X;’s are pairwise uncorrelated; (46) 
i=l 
i.e., o(Xi, Xj) = Ofori £ j. 


4.1 If the rv.’s X,, X2, Xg have the joint p.d.f. fx, x,x,(%, Ya, %3) = 
30141248) a > 0, £2 > 0, x3 > 0 (c > 0), determine: 
(i) The constant c. 
(ii) The marginal p.d.f.’s fx,, fx,, and fx. 
(iii) The joint conditional p.d.f. of X; and X2, given X3. 
(iv) The conditional p.d.f. of X1, given Xə and X3. 


4.2 Determine the joint mgf. of the rv’s Xj, X2,X3 with p.d.f. 
SXXX (01, X2, X3) = CeT Att) 2 > 0, 02 > 0, xs > 0 (c any positive 
constant, see also Exercise 4.1). 


4.3 (Cramér-Wold devise) Show that if we know the joint distribution of the 
r.v.’s Xj, ..., Xn, then we can determine the distribution of any linear com- 
bination c1Xı + ---+ CnXn of X¡,..., Xn, where Cj, ..., Cn are constants. 
Conversely, if we know the distribution of all linear combinations just 
described, then we can determine the joint distribution of Xj, ..., Xn. 


4.4 If the rv.’s X1,..., Xm and Yj,..., Yn have finite second moments, then 
show that: 


Cov (Sx, 25) =>) cov Xi, Yj). 





| 4.5 The Multinomial, the Bivariate Normal, and the Multivariate Normal Distributions 


In this section, we introduce and study to some extent three multidimen- 
sional distributions; they are the Multinomial distribution, the 2-dimensional 
Normal or Bivariate Normal distribution, and the k-dimensional Normal 
distribution. 
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L| 4.5.1 Multinomial Distribution 


A multinomial experiment is a straightforward generalization of a binomial 
experiment, where, instead of 2, there are k (mutually exclusive) possible 
outcomes, Oj, ..., Ox, say, occurring with respective probabilities p;, ..., Pk- 
Simple examples of multinomial experiments are those of rolling a die (with 
6 possible outcomes); selecting (with replacement) r balls from a collection 
of n +--- + ny balls, so that n; balls have the number i written on them, i = 
1, ..., k; selecting (with replacement) r objects out of a collection of objects 
of which n; are in good condition, nz have minor defects, and nz have serious 
defects, etc. Suppose a multinomial experiment is carried out independently 
n times and the probabilities p,,..., Px remain the same throughout. Denote 
by X; the r.v. of the number of times outcome O; occurs, 7 = 1, ..., k. Then 
the joint p.d.f. of X1, ..., Xx is given by: 


n! , 
Sx, X -ooa Up) = aa see ie (47) 
O A 


where 2%,...,% are > 0 integers with x + --- + % = n, and, of course, 
0<p < 1, i= 1,...,k, py +---+ pk = 1. The distribution given by (47) 
is the Multinomial distribution with parameters n and p;,..., px, and the 
r.v.’s Xj, ..., Xy are said to have the Multinomial distribution with these pa- 
rameters. That the right-hand side of (47) is the right formula for the joint 
probabilities P(X; = %,..., Xk = Xx) ensues as follows: By independence, 
the probability that O; occurs n; times, i = 1, ...,k, in specified positions, 
is given by: pj... p;," regardless of the positions of occurrence of O,’s. The 
different ways of choosing the n; positions for the occurrence of O;,7 = 
1, ..., k, is equal to: (;") a) vee a © Writing out each term in fac- 
torial form and making the obvious cancellations, we arrive at: n!/(x1!...xg!) 
(see also Exercise 5.1). For illustrative purposes, let us consider the following 
example. 


L EXAMPLE 24 | A fair die is rolled independently 10 times. Find the probability that faces 


#1 through #6 occur the following respective number of times: 2, 1, 3, 1, 2, 
and 1. 


DISCUSSION By letting X; be the r.v. denoting the number of occurrences 
of face i, i = 1, ..., 6, we have: 


10! 4,725 
2, 1,3, 1, 2, 1) = 5121/60 = 22 x 0.008. 
Sx,.._x(2 1, 8, 1, 2, 1) ENT 18) 1,889,568 A 


In a Multinomial distribution, all marginal p.d.f.'s and all conditional p.d.f.'s 
are also Multinomial. More precisely, we have the following result. 


THEOREM 4 
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Let X;,..., Xx be Multinomially distributed with parameters n and 
¡Dio aces i, Gael ori <s < le << 0 <>... <b sl Y= 
f= (A ak ooo AJda = ll = (Ud e T 03, )), Then: 


G) The rv.’s Xi, ..., X;,, Y are distributed Multinomially with parame- 
tersmandip so ay Pi Clo 

Gi) The conditional joint distribution of X;,,..., Xp, given Xi, = Xi, ..., 
Xi, = Xi, is Multinomial with parameters n — r and p;,/4,..., Pj./9, 
WINE? = li, SP ooo SE a, mal = I = S. 











PROOF 
(i) For > 0 integers ;,,..., vj, with a, +- + Xi =r <n, we have: 
Je (Xi, e] Xi) = PX, = is-3 Xi, = Xi) 
= PX: = Xi eee. E = Tis Y =n-—r) 
n! Li Lig n—r 
ml. aa o PT 
Gi) For > 0 integers £j, ..., Xy with vj, +-+- + £ =N— r, we have: 


IX; aa XlX Xi Eho ++) Li Li) >>) Lig) 
= P(X; = Uy, Aj = Til Xi, = Hy, ..., Xi, = Ui,) 
= P(Xá4 = Lio) Aj = Yi, Xi, = Ly, >>, Xi, = Li,)/ 
PX, = Vio Xe = Yi) 
n 


= jy Li Ti Vis 
NARA -e Ph x Pa e] 
Ji TES tst 








n! Ti Xis „nr 
( ! ! (Pi + Bid 
Xi!...2Un— r)! 


n—r)! 


= (9/0) ... (p/q). A 
A kii Lj! 


In reference to Example 24, calculate: P(X2 = X4 = X6 = 2) and P(X; = 
Xs = 1, X5 = 2| Xə = X4 = Xp = 2). 








DISCUSSION Heren= 10, r= 6, p: = p, = pp = g andq = 1- 3} = 4. 
Thus: 








101 /1\°/1\* 4725 
PL = Xa = Xo = 2) = ao (5) (5) = 796,624 002, 
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and: 








4! /1/6\* 4 
PX, =X = 1, X = 2| Xo = Xa = Xp = 2) = Tap 1/2 = 57 = 0.148. 


Ina genetic experiment, two different varieties of a certain species are crossed 
and a specific characteristic of the offspring can occur only at three levels, A, 
B, and C, say. According to a proposed model, the probabilities for A, B, and 
C are > >» and Å, respectively. Out of 60 offspring, calculate: 


G) The probability that 6, 18, and 36 fall into levels A, B, and C, respectively. 
Gi) The (conditional) probability that 6 and 18 fall into levels A and B, respec- 
tively, given that 36 falls into level C. 


DISCUSSION 





G) Formula (47) applies with n = 60, k = 3, pı = > p2 = > p3 = > “i= 
6, xa = 18, x3 = 36 and yields: 


601 /1N6f/3\®/8\” 
P(X, = 6, X2 = 18, Xs = 36) = ~ 0.011. 
AS MASA E sana (is) (5) (Š) 








Gi) Here Theorem 4(ii) applies with s = 1, t = 2, x, = 3 = 36, vj = %ı = 
6, Xj, = x2 = 18, r = 36,sothatn—r = 60—36 = 24, q = l- p; = 1- $ = $, 
and yields: 


P(X: = 6, Xo = 18/X3 = 36) = (n—r)! (2) ey 














alx! Xq q 
6 18 
61181 4 4 
| (24\ (1)°/3\* 
~\6/)\4) La 


= 0.1852 (from the Binomial tables). 





An application of formula (42) gives the joint m.g.f. of X1, ..., Xy as follows, 
where the summation is over all > 0 integers z, ..., % with a7 +---+%%,=n: 


n 
ttt -+ tt xi Xk 
Mx,,...,. x, (hy <- th) = ) Crea ++ Dy 
Xi: Lp: 


n 
= > wai alpen y 
= (pie +--+ me)"; Le, 


Mx, xi (la, <- te) = Cpe" +--+ + pee)", t,- t ER. (48) 


By means of (44) and (48), we can find the Cov(X;, X;) and the p(X;, X;) 
for any 1 <i < j < k. Indeed, EX; = np;, EX; = np;, Var(Xi) = npid1 — pi), 
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Var(X;) = np;(1 — pj) and E(X¡X;) = n(n — 1)p;p;. Therefore: 
Cov(X;, Xj) = —npip; and p(X, Xj) = —[pip;/(A—- pi) — py)” 
(49) 


(see Exercise 5.4 for details). 


L. 4.5.2 Bivariate Normal Distribution 


Figure 4.6 


Graphs of the p.d.f. of 


the Bivariate Normal 
Distribution: (a) 
Centered at the Origin; 
(b) Centered Elsewhere 
in the (x, y)-plane 


The joint distribution of the r.v.'s X and Y is said to be the Bivariate Normal 
distribution with parameters 1, 42 in R, 01, o2 positive and p e [—1, 1], ifthe 
joint p.d.f. is given by the formula: 


1 
Fx y (a, y) = —— e, x, YER, (50) 
A 270,091 p? 


where 


= AED) 2p( ZH) (172) + (2) ] 61) 
l-p 01 01 02 02 


This distribution is also referred to as 2-dimensional Normal. The shape of 
Fx, y looks like a bell sitting on the xy-plane and whose highest point is located 
at the point (u1, ua, 1/(2270102,/ 1 — p2)) (see Figure 4.6). 








Sxy@, y) Az y (o, y) 
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UO. 
POX DSS 


ROSS 


QQ 
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That fx y integrates to 1 and therefore is a p.d.f. is seen by rewriting it in a 
convenient way. Specifically, 


ea 9 (A E + (4) 
01 P 01 02 02 
Y— H2 : X— M1 Y— ba X— Hi A 9 [Y HI E 
aCe aa) ae A) 
02 01 09 O71 O71 
2 2 
_ (55) (=) +a-(24) | (52) 
02 01 O71 











144 


Chapter 4 Joint and Conditional p.d.f.’s, Conditional Expectation and Variance 





Furthermore, 
Y— u2 Z-fi y-pe 1 £ — Hi 
pP = : PO2 
02 O71 02 02 01 
1 po2 
= ={y- Es a-m] 
02 01 
— by o 
= =n where by = u2 + Pg= 
02 O71 


(see also Exercise 5.6). 
Therefore, the right-hand side of (52) is equal to: 


an? a 2 
(2 =) +a-( a) , 
02 O71 
and hence the exponent becomes: 
(a — uy (Y — bx) 
207 Loayl-pPY 


Then the joint p.d.f. may be rewritten as follows: 











AN E 1 a a 
Í y, y) = e To. e xl- | 
à vV 2701] 2n (02,/1 — p?) 


The first factor on the right-hand side of (53) is the p.d.f. of N(1, oí) and the 
second factor is the p.d.f. of N(b,, (o2,/1 — p2)). Therefore, integration with 
respect to y produces the marginal N(41, of) distribution, which, of course, 
integrates to 1. So, we have established the following two facts: fe. de 
Sx,y(%, y du dy = 1, and 


X ~ N(ui, 07), and, by symmetry, Y ~ N(ua, 03). (54) 


The results recorded in (54) also reveal the special significance of the param- 
eters 11, of and u2, 07. Namely, they are the means and the variances of the 
(normally distributed) r.v.'s X and Y, respectively. Relations (53) and (54) also 
provided immediately the conditional p.d.f. fy,x; namely, 


1 (y = Day | 
dneni- p | TE 


Thus, in obvious notation: 


Srix(y/x) = 








po: 
Y|X=2~ N(by, (01-00)  br=u2+ oe m) (55) 
and by symmetry: 
po 
X|Y=y~ Næ, Giv1—p?)),  by= m+ z e u2). (56) 


In Figure 4.7, the conditional p.d.f. fyix(C | x) is depicted for three values of 
x:x = 5, 10, and 15. 
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Figure 4.7 





Conditional 
Probability Density 
Functions of the 
Bivariate Normal 
Distribution 

















Formulas (53), (54), and (56) also allow us to calculate easily the covariance 
and the correlation coefficient of X and Y. Indeed, by (53): 


Bay) = f f sufera du dy = / fete] f uso oy | 


z / aefx(a)by da = J zalu Pw m| de 


= M1M2 + P0102 
(see also Exercise 5.7). Since we already know that EX = u, EY = pa, and 
Var(X) = of, Var(Y) = 03, we obtain: 
Cov(X, Y) = E(XY) — (EX)(EY) = wipe + p0102 — pitz = P0102, 
and therefore p(X, Y) = 4% = p. Thus, we have: 


0102 


Cov(X, Y) = pojoz and p(X,Y)=p. (57) 





Relation (57) reveals that the parameter p in (50) is, actually, the correlation 
coefficient of the r.v.'s X and Y. 


If the r.v.'s Xy and X2 have the Bivariate Normal distribution with parameters 

M1, 12, of, 03, and p: 

(i) Calculate the quantities: E(c1Xı + C2X2), Var(cıXı + C2X2), where c], C2 
are constants. 

(ii) How the expression in part (i) becomes for: 41 = —1, u2 = 3, of = 4, 0% = 
9, and p = 12 





DISCUSSION 


(i) E(c¡X, +¢2X2)=C, EX, +c2EX3=C1pM1 +C2Mt2, since X; ~ N (mi, 07), SO 
that EX; = ui, i = 1, 2. Also, 


Var(c,X + ¢2X2) = cjox, + chox, +2c1020x,0x,P(X1, X2) (by (82)) 


2,2 2,2 
= C10¡ + C303 + 2€61C20102/, 
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since X; ~ N(u;, 07), so that Var(X;) = of, i = 1, 2, and p(X%, X2) = p, by 
(57). 
(ii) Here E(c1Xı + c2X2) = —C¡ + 3ca, and Var(c,X, + ¢2X2) = 4c, + 9ca + 
20102 X 2x 3 x $ = 4c) + 9c2 + 6c1C2. 

Finally, it can be seen by integration that the joint m.g.f. of X and Y is given 
by the formula: 


Ly ou P 
Mx y (tr, t2) = exp [pat + pala + zit + 2p0102t1l2 + 036)), ti, la E Ñ; 


(58) 


we choose not to pursue its justification (which can be found, e.g., in pages 
158-159, in the book “A Course in Mathematical Statistics,” 2nd edition (1997), 
Academic Press, by G.G. Roussas). We see, however, easily that 


ð 
zg Mx rr, t2) = (mı + oft + p0102t2) Mx y(t, t2), 
1 


and hence: 
a 2 
apap AL rl, la) = pojo2My y(t, t2) + (m +04 + poiozt2) 
10l2 
x (ua + 03t2 + porozt1) Mx y(t, t2), 


which, evaluated at t = ta = 0, yields: pojoz + iuz = E(XY), as we have 
already seen. 


L.) 4.5.3 Multivariate Normal Distribution 


The Multivariate Normal distribution is a generalization of the Bivariate Nor- 
mal distribution and can be defined in a number of ways; we choose the one 
given here. To this end, for k > 2, let y = (u1, ..., yuk) be a vector of constants, 
and let Y be ak x k nonsingular matrix, so that the inverse X7! exists and 
the determinant || 4 0. Finally, set X for the vector of rv.’s Xj, ..., Xx; i.e., 
X=(X,..., Xy) and x = (a, ..., %) for any point in N*. Then, the joint p.d.f. 
of the X;’s, or the p.d.f. of the random vector X, is said to be Multivariate 
Normal, or k-Variate Normal, if it is given by the formula: 


Si) = 





1 1 
Gays? o A 
where, it is to be recalled that “” stands for transpose. 

It can be seen that: EX; = mi, Var(X;) = o; is the (i, i)th element of Y, 
and Cov(X;, X;) is the (i, 7)th element of >, so that y = (EX1, ..., EXy) 
and X = (Cov(X;, X;)), i j = 1,..., k. The quantities u and > are called the 
parameters of the distribution. It can also be seen that the joint m.g.f. of the 
X;’s, or the m.g.f. of the random vector X, is given by: 


1 i 
Mx(t) = exp (ue + ze), ten’. 
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The k-Variate Normal distribution has properties similar to those of the 
2-dimensional Normal distribution, and the latter is obtained from the former 
by taking u = (41, #2) and X = (e) where p = p(X1, X2). 

More relevant information can be found, e.g., in Chapter 18 ofthe reference 
cited in the discussion of Example 27. 


5.1 Show that 


n n— n NM — +++ — Np n! 
nı Na Nk ~ n'na! es Ly! 


5.2 In a store selling TV sets, it is known that 25% of the customers will 
purchase a TV set of brand A, 40% will purchase a TV set of brand B, and 
35% will just be browsing around. For a lot of 10 customers: 

(i) What is the probability that 2 will purchase a TV set of brand A, 3 will 
purchase a TV set of brand B, and 5 will purchase neither? 
(ii) If it is known that 6 customers did not purchase a TV set, what is the 
(conditional) probability that 1 of the rest will purchase a TV set of 
brand A and 3 will purchase a TV set of brand B? 


5.3 Human blood occurs in 4 types termed A, B, AB, and O with respective 
frequencies pa = 0.40, pg = 0.10, Pag = 0.05, and po = 0.45. fn 
donors participate in a blood drive, denote by X4, Xp, X4p, and Xo the 
numbers of donors with respective blood types A, B, AB, and O. Then 
Xa, Xp, Xap, and Xo are r.v.'s having the Multinomial distribution with 
parameters nand Pa, Pp, Pag, Po. Write out the appropriate formulas for 
the following probabilities: 

(i) P(Xa = Za, XB = Xp, Xap = Xap, Xo = Xo) for Xa, Xp, Xap, and 
Xo nonnegative integers with x4 + Xp + %ap+%X = ^. 
Gi) P(X4 = Za, XB = Xp, Xap = Xap). 
(iii) P(X4 = Xa, Xp = Xp). 
(iv) P(X4 = Za). 
(v) P(X4 = x1, XB = Xp, Xap = Xap|Xo = Xo). 
(vi) P(X4 = La, Xp = Xp| Xap = Xap, Xo = Xo). 
(vii) P(X4 = x4 | XB = Xp, Xan = Xap, Xo = Xo). 
(viii) Give numerical answers to parts (i)-(vii), if n = 20, and x4 = 
3,09 = 2, Ap 1, Xo = 9; 





5.4 In conjunction with the Multinomial distribution, show that: 


EX; = np;, EX; = np;, Var(X;) = np; — pi), Var(X;) = np; — py), 
PiPj 


Cov(X;, X;)= —npip; and p(X; X;)= ] 
Á í í [pid — pop; — p) 
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5.5 Refer to Exercises 5.3 and 5.4, and for n = 20, calculate the quantities: 
EXa, EXp, EXap, EXo; Var(Xa), Var(Xg), Var(Xap), 
Var(Xo), Cov(Xa, Xg), Cov(Xa, Xap), Cov(Xa, Xo); 
P(X4, Xp), P(Xa, Xap), P(Xa, XO). 


5.6 Elaborate on the expressions in (51), as well as the expressions following 
(51). 


5.7 If the r.v.’s X and Y have the Bivariate Normal distribution with parame- 
ters mı, H2, of, 03, and p, show that E(XY) = 112 + p0102. 


Hint: Write the joint p.d.f. fx y as fyx(y| x) fx(@) and use the fact 
(see relation (54)) that E(Y | X = ©) = by = u2 + ed — u). 


5.8 If the r.v.'s X and Y have the Bivariate Normal distribution, then, by 
using Exercise 5.7, show that the parameter p is, indeed, the correlation 
coefficient of the r.v.'s X and Y, p = p(X, Y). 


5.9 If the r.v's X and Y have the Bivariate Normal distribution, and c;, Ca 
are constants, express the expectation E(c¡X + c2Y) and the variance 
Var(c,X +c2Y) in terms of c, Ca, yı = EX, u2 = EY, of = Var(X), of = 
Var(Y), and p = p(X, Y). 


5.10 If the r.v.’s X and Y have the Bivariate Normal distribution, then it is 
known (see, e.g., relation (11) on page 158 in the book A Course in 
Mathematical Statistics, 2nd edition (1997), Academic Press, by G.G. 
Roussas) that the joint m.g.f. of X and Y is given by: 


1 
Mx, y(t1, t2)= exp [114 + puta + 5 (ort + 2po oat le +036)), t, t2 EÑ. 


Use this m.g.f. in order to show that: 
EX = m, EY = pa, Var(X) = of, Var(Y) = 03, 
Cov(X, Y) = pojo», and p(X, Y) =p. 


5.11 Use the joint m.g.f. of the r.v.’s X and Y having a Bivariate Normal distri- 

bution (see Exercise 5.10) in order to show that: 

(i) If X and Y have the Bivariate Normal distribution with parameters 
Hı, La, 07, 03, and p, then, for any constants c; and c2, the rv. c1X + 
c2Y has the Normal distribution with parameters c¡u] + C2442, and 
cia? + 2C1C9p0102 + o3. 

(ii) Ifther.v. cı X+c2Y is Normally distributed, then the r.v.’s. X and Y have 
the Bivariate Normal distribution with parameters u; = EX, ua = 
EY, o? = Var(X), 03 = Var(Y), and p = p(X, Y). 


5.12 Consider the function f defined by: 





us x2+y2 


e 2 
Coe) y 
an? 


for (x, y) outside the square [—1, 1] x [—1, 1] 
2 
2 +51 ag, for (a, y) in the square [—1, 1] x[—1, 1]. 
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(i) Show that f is a non-Bivariate Normal p.d.f. 
(ii) Also, show that both marginals, call them f and f2, are N(0, 1)p.d.f.'s. 


Remark: Weknowthatif X, Y have the Bivariate Normal distribution, 
then the distributions of the r.v.'s X and Y themselves are Normal. This 
exercise shows that the inverse need not be true. 


5.13 Let the r.v.'s X and Y have the Bivariate Normal distribution with param- 
eters 11, H2, 07, oF, and p, and set U = X + Y, V = X — Y. Then show 
that: 

(i) The r.v.’s U and V also have the Bivariate Normal distribution with 
parameters mı + Ma, 41 — H2, t = o? + 200102 + o3, Tå = of — 
200102 + o3, and po = (oF = o3) /T1T2. 

(ii) U ~ N(uy + Ma, 77), V ~ Nu — Ma, t3). 

Gii) The r.v.’s U and V are uncorrelated if and only if of = 0%. 






Chapter 9 
y 


Y 


Independence of 
Random Variables 
and Some Applications 


This chapter consists of two sections. In the first section, we introduce the 
concept of independence of r.v.'s and establish criteria for proving or disprov- 
ing independence. Also, its relationship to uncorrelatedness is discussed. In 
the second section, the sample mean and the sample variance are defined, and 
some of their moments are also produced. The main thrust of this section, 
however, is the discussion of the reproductive property of certain distribu- 
tions. As a by-product, we also obtain the distribution of the sample mean and 
of a certain multiple of the sample variance for independent and Normally 
distributed r.v.'s. 


i 5.1 Independence of Random Variables and Criteria of Independence 


150 


In Section 4 of Chapter 2, the concept of independence of two events was 
introduced and it was suitably motivated and illustrated by means of examples. 
This concept was then generalized to more than two events. What is done in 
this section is, essentially, to carry over the concept of independence from 
events to r.v.’s. To this end, consider first two r.v.’s X; and X and the events 
induced in the sample space S by each one of them separately as well as by 
both of them jointly. That is, for subsets B,, B2 of R, let: 


Ay = (Xi € Bı) = X (B1) = {s € S; Xi(s) e Bi}, (1) 
Az = (X2 € By) = X3 (B2) = {s € S; Xa(s) € Ba), (2) 
Ajo = ((X1, X2) € Bı x By) = (Xı € Bı € X2 € By) = (X1, X2) MB, x Bo) 


= {s € S; X¡(s) € Bı as Xa(s) € Bo} = Ay N As. (3) 


THEOREM 1 
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Then the r.v.'s X1, Xə are said to be independent if, for any B, and Ba as before, 
the corresponding events A; and Aə are independent; that is, P(A; N A2) = 
P(A,)P(Ag). By (1)-(8), clearly, this relation is equivalent to: 


P(X, € Bi, Xə € Bə) = P(X € B) P (X2 € Bə). (4) 
This relation states, in effect, that information regarding one r.v. has no effect 
on the probability distribution of the other r.v. For example, 
P(X: € B, Xə € Bə) 
P(X2 € Ba) 
P(X: € B) P (X2 € Bə) 
7 P(X € B2) 


Relation (4) is taken as the definition of independence of these two r.v.'s, which 
is then generalized in a straightforward way to k r.v.'s. 





P(X: € Bı|X2 € Ba) 





= PX, € Bi). 


DEFINITION 1 
Two r.v.'s X, and X; are said to be independent (statistically or stochas- 
tically or in the probability sense) if, for any subsets B; and Bə of K, 


P(X; € Bi, Xə € Bə) = P(X: € B )P(X2 € Bə). 


The r.v's X,..., Xy are said to be independent (in the same sense as 
above) if, for any subsets Bi, ..., By of K, 
k 
PQ, € By i=1,...,)=| | PÆ: e B). (5) 
1 


i= 
Nonindependent r.v.’s are said to be dependent. 


The practical question which now arises is how one checks independence 
of k given r.v.’s, or lack thereof. This is done by means of the following cri- 
terion referred to as the Factorization Theorem because of the form of the 
expressions involved. 





(Criterion of independence, Factorization Theorem) Fork > 2, 
ther.v.'s Xj, ..., Xy are independent if and only if any one of the following 
three relations holds: 

@ Fx... Xx, Ay ++» 7) = Fx, (6) Fx, (x) (6) 
for alll Tig soy A 

(11) fx... XO, ---, Ue) = Sx, (01): ++ Sx, (5) (7) 
for EM Wi, coy Gay WM Bi 

Gii) Mx,,....x, (l1, ---, te) = Mx, (41): + Mx, (tx) (8) 
for all 4, ..., t in a non-degenerate interval containing 0. 
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Before we proceed with the justification of this theorem, let us refer to 
Example 1 in Chapter 4 and notice that: fx(3) = 0.04, fy(2) = 0.15, and 
Fx r(3, 2) = 0.02, so that fx y (3, 2) = 0.02 4 0.04 x 0.15 = 0.006 = fx(3) fr(2). 

Accordingly, the r.v.’s X and Y are not independent. On the other hand, in 
reference to Example 2 (see also Example 6), we have, for all x, y > 0: 


Fxr(x, Y) = 1126 98 = Are Be? = ADD, 


so that fx y(x, y) = fx(0)fy(y) for all x and y, and consequently, the r.v.'s 
X and Y are independent. Finally, refer to the Bivariate Normal distribution 
whose p.d.f. is given by (49) of Chapter 4 and set p = 0. Then, from (49), (50), 
and (53), we have fx y(a, y) = fx(x)fr(y) for all x and y. Therefore, p = 0 
implies that the r.v.'s X and Y are independent. 


Examine the r.v.’s X and Y from an independence viewpoint, if their joint p.d.f. 
is given by: fx, y(x, y) = 4xy, 0 < x < 1,0 < y < 1 (and 0 otherwise). 


DISCUSSION We will use part (ii) of Theorem 1 for which the marginal 
p.d.f.’s are needed. To this end, we have: 


1 
feta) = e | ydy=2x, (exek 
0 


j 
few) =4y f xdx=2y 0O<y<l. 
0 


Hence, for all 0 < x < 1and 0 < y < 1, it holds that: 2x x 2y = 4xy, or 
Sx(a) fry) = Sxy(x, y). This relation is also, trivially, true (both sides are 
equal to 0) for x and y not satisfying the inequalities 0 <x<1land0 < y< 1. 
It follows that X and Y are independent. 

Here are two examples where the r.v.'s involved are not independent. 


If the r.v.'s X and Y have joint p.d.f. given by: fx y(x, y) =2,0<xw<y=<l 
(and 0 otherwise), check whether these r.v.'s are independent or not. 


DISCUSSION Reasoning as in the previous example, we find: 


1 
feta) =2 f dy=2(1-%), 0O<xw=<l; 


y 
fry =2f dw=2y, 0<y<l. 
0 


Then independence of X and Y would require that: 4(1 — x)y = 2 for all 
0 < x < y< 1, which, clearly, need not hold. For example, for x = i y= 
5, 41-0)y=4x 3 x $ = 3 #2. Thus, the X and Y are not independent. 


In reference to Example 8 in Chapter 4, the r.v.'s X and Y have joint p.d.f. 
Fx, y(x, y) = 8xy, 0 < x< y < 1 (and 0 otherwise), and: 


K(0)=4x(1-x%, 0O<x<l; fr(y=4y?, 0O<y<l. 
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Independence of X and Y would require that: 4x(1 — x?) x 4y? = 8xy or 
(1 — a%)y? = 3, 0 < x < y < 1. However, this relation need not be true 
because, for example, for x = j and y = 3, we have: left-hand side = 3 4 3 = 
right-hand side. So, the r.v.’s X and Y are dependent. 


REMARK 1 Onthe basis of Examples 2 and 3, one may surmise the following 
rule of thumb: If the arguments x and y (for the case of two r.v.'s) do not vary 
independently of each other, the r.v.'s involved are likely to be dependent. 

A special case of the following result will be needed for the proof of 
Theorem 1. 


PROPOSITION 1 Consider the r.v.’s X;,..., Xy, the functions g;: R > Rh, 


i = 1,...,k, and suppose all expectations appearing below are finite. Then 
independence of the r.v.’s X1, ..., Xy implies: 
k k 
e| [ocx] = fJ rac (9) 
¿=1 i=l 


PROOF Suppose ther.v.’s are of the continuous type (so that we use integrals; 
replace them by summations, if the r.v.’s are discrete). Then: 


=f -f G11) + 90) fx, C1) ++ Sx) da >> ds 


(by independence) 


= / ga) Fx, (1) dan] ee |f I (XK) Fx, (Xe) das] 


k 
= Eg(X)---Eg(X)=| [Egi(X). 4 
i=1 


COROLLARY 1 By taking g;(X;) = e"*', t e R, i = 1,...,k, relation (9) 
becomes: 


k 
Eexp(4X, +++: + tX) =|] Eexp(1,X;), or 


i=1 


k 
Mx... Xt --- t) = | [Mx @0. (10) 

i=1 
COROLLARY 2 If the r.v.'s X and Y are independent, then they are un- 


correlated. The converse is also true, if the r.v.'s have the Bivariate Normal 
distribution. 
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PROOF In (9), take k = 2, identify X, and Xə with X and Y, respectively, 
and let g¡(x) = g(x) = x, x e R. Then E(XY) = (FX)(ELY), which im- 
plies Cov(X, Y) = 0 and p(X, Y) = 0. The converse for the Bivariate Normal 
distribution follows by means of (50) and (53) in Chapter4. A 


REMARK2 That uncorrelated r.v.'s are not, in general, independent may be 
illustrated by means of examples (see, e.g., Exercise 1.20). 


REMARK 3 T1X;,..., Xy are independent r.v.'s, then it is intuitively clear 
that independence should be preserved for suitable functions of the X;’s. For 
example, if Y; = g(X;), i = 1,...,k, then the rv.’s Y¡,..., Y, are also inde- 
pendent. Independence is also preserved if we take different functions of the 
X;’s, provided these functions do not include the same X;’s. For instance, if 
Y =Y9(X;,..., Xin) and Z = h(X;,,..., Xp) wherel <ù <-**<im< k,1< 
Ji <- < Jn < kandall%, ..., i, are distinct from all j, ..., Jn, then the r.v.'s 
Y and Z are independent. This will be a rule of thumb to be followed in this 
book. 


PROOF OF THEOREM 1 The proof can be only partial but sufficient for the 
purposes of this book. 


(i) Independence of the r.v.'s Xy, ..., Xy means that relation (5) is satisfied. 
In particular, this is true if B; = (—oo, xi], 7 = 1, ..., k which is (6). That 
(6) implies (5) is a deep probabilistic result dealt with at a much higher 
level. 

(ii) Suppose the r.v.’s are independent and first assume they are discrete. 
Then, by taking B; = {x;}, i = 1, ..., kin (5), we obtain (7). If the r.v.'s are 
continuous, then consider (6) and differentiate both sides with respect to 


Xi, ..., Lk, which, once again, leads to (7) (for continuity points 2%, ..., %). 

For the converse, suppose that (7) is true; that is, for all 4, ..., t in R, 
SXi, X -o o te) = fx h)--- Si (tx). 

Then, if the r.v.’s are discrete, sum over the t;’s from —cotox;, i= 1,...,k 


to obtain (6); if the r.v.’s are continuous, replace the summation operations 
by integrations in order to obtain (6) again. In either case, independence 
follows. 

(iii) Independence of X4, ..., Xx implies (8) by means of Corollary 1 to Propo- 
sition 1 above. 


The converse is also true but its proof will not be pursued here (it requires 
the use of the so-called inversion formula as indicated in Section 1 of Chapter 3 
and Remarks 1 and 2 of Chapter 4). A 


Part Gi) of Theorem 1 has the following corollary, which provides still 
another useful criterion for independence of k r.v.'s. 
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COROLLARY 3 The rv's X,,..., Xy are independent if and only if 
JX, Xx (0) +--+ Ue) = hi1): hu (a) for all a, ..., 1, in R, where h; is a non- 
negative function of x; alone, i = 1, ..., k. 


PROOF Suppose the rv.’s X,...,X, are independent. Then, by (7), 
Sx, on x, (21, oisg Xk) = Fx, (%1) aaa Sx, (1) for all x;,... , Xy in R, so that the 
above factorization holds with h; = fx,,i = 1,...,k. Next, assume that the 
factorization holds, and suppose that the r.v.’s are continuous. For each fixed 
i= l; k set 


CO 
Ci = f hi(xi) dæi, 
—o0 


00 00 
so that Cy... Ck = I hy (a) ax, a8 Sl hy (90%) AX, 
— —0o 


00 

o0 00 
de ] h(i)... hk) day» + - AX 

00 =00 

00 

00 


CO 
sl Bs. X -y W) ORs + ty 
—00 


/ 
J 


q X, -.-, %) with respect to all x,'s with j 4 i, we get 


Then, integrating fx, 


Sx, (xi) = C1... G1 Cig... Chri) 





1 
= —hi(%). 

Cj 

Hence 
Sx, (æ) vs Sx, (x) = hy (a) es h(x) 
C1...Ck 
= h (21)... Ren) = SXi, XU ---, Ur), 

or fx... Xx (01) ++, Ue) = Saa) -Saar for all x1, ..., x, in R, so that the 
r.v.’s X1, ..., Xy are independent. The same conclusion holds in case the r.v.'s 


are discrete by using summations rather than integrations. A 


The significance of Corollary 3 is that, in order to check for independence 
of the r.v.’s X1,..., Xy all one has to do is to establish a factorization of fx... x, 
as stated in the corollary. One does not have to verify that the factors are the 
marginal p.d.f.’s (that will follow as indicated previously). 

This section is concluded with the definition of what is known as arandom 
sample. Namely, n independent and identically distributed (i.i.d.) r.v.'s are re- 
ferred to as forming a random sample of size n. Some of their properties are 
discussed in the next section. 
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1.1 In reference to Exercise 2.5 in Chapter 4, determine whether or not the 
r.v's X and Y are independent. Justify your answer. 


1.2 In reference to Exercises 1.1 and 2.1 in Chapter 4, determine whether or 
not the r.v.'s X and Y are independent. 


1.3 The r.v.'s X, Y, and Z have the joint p.d.f. given by: fx y,z(x, y, 2) = - if 
x=1,y=2=0;x=0,y=1,2=0xX=y=0,2=1lx=y=2=1. 
(i) Derive the marginal joint p.d.f.’s fxy, f£x,z, fy,z- 
(ii) Derive the marginal p.d.f.’s fx, fy, and fz. 
(iii) Show that any two of the r.v.’s X, Y, and Z are independent. 
(iv) Show that the r.v.'s X, Y, and Z are dependent. 





1.4 In reference to Exercise 2.8 in Chapter 4, decide whether or not the r.v.’s 
X and Y are independent. Justify your answer. 


1.5 Inreference to Examples 4 and 7 in Chapter 4, investigate whether or not 
the r.v.’s X and Y are independent and justify your answer. 


1.6 Let X and Y be r.v.'s with joint p.d.f. given by: 


6 
Sxv@ =z +9), besel 0esyel 


(i) Determine the marginal p.d.f.'s fx and fy. 
(ii) Investigate whether or not the r.v.’s X and Y are independent. Justify 
your answer. 


1.7 The r.v.'s X, and Y have joint p.d.f. given by: 
Fx y(x, y) =1, O<x<1l, 0O<y<l. 


Then: 
(i) Derive the marginal p.d.f.'s fx, and fy. 
(ii) Show that X and Y are independent. 
(iii) Calculate the probability P(X + Y < c). 
(iv) Give the numerical value of the probability in part (iii) for c = 1/4. 
1.8 The r.v.’s X, Y, and Z have joint p.d.f. given by: 
Sxy,z@, y, 2) = 8ayz,  O<xw<1l, O<y<l, 0<z<l1. 
(i) Derive the marginal p.d.f.’s fx, fy, and fz. 


(ii) Show that the r.v.’s X, Y, and Z are independent. 
(iii) Calculate the probability P(X < Y < Z). 


1.9 The r.v.'s X and Y have joint p.d.f. given by: 
Sxr(x,y)=c, fora?+y<09. 


(i) Determine the constant c. 
(ii) Derive the marginal p.d.f.’s fx and fy. 
(iii) Show that the r.v.’s X and Y are dependent. 
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1.10 The r.v.’s X, Y, and Z have joint p.d.f. given by: 
Fira, y D = de Ow), x>0, y>0, 2>0. 


(i) Determine the constant c. 
(ii) Derive the marginal joint p.d.f.’s fx, y, fx,z, and fy. z. 
(iii) Derive the marginal p.d.f.’s fx, fy, and fz. 
(iv) Show that any two of the r.v.’s X, Y, and Z, as well as all three r.v.'s 
are independent. 


1.11 The rv.’s X and Y have joint p.d.f. given by the following product: 
Sx, y(x, y) = g(x)h(y), where g and h are nonnegative functions. 
(i) Derive the marginal p.d.f.’s fy and fy as functions of g and k, respec- 
tively. 
(ii) Show that the r.v.’s X and Y are independent. 
(iii) If h = g, then the r.v.’s are identically distributed. 
(iv) From part (iii), conclude that P(X > Y) = 1/2, provided the distri- 
bution is of the continuous type. 


1.12 The life of a certain part in a new automobile is a r.v. X whose p.d.f. is 

Negative Exponential with parameter à = 0.005 days. 

(i) What is the expected life of the part in question? 

(ii) If the automobile comes with a spare part whose life is a r.v. Y dis- 
tributed as X and independent of it, find the p.d.f. of the combined 
life of the part and its spare. 

(iii) What is the probability that X + Y > 500 days? 


1.13 Let the r.v. X be distributed as U(0, 1) and set Y = —logX. 
(i) Determine the d.f. of Y and then its p.d.f. 
(ii) If the r.v.'s Yi, ..., Y, are independently distributed as Y, and Z = 
Y, +---+ Yn, determine the distribution of the r.v. Z. 


1.14 Let the independent r.v.'s X and Y be distributed as N(1, oí) and 
N(uz, 03), respectively, and define the r.v.'s U and V by: U = aX +b, V = 
cY + d, where a, b, c, and d are constants. 

(i) Use the m.g.f. approach in order to show that: 


U ~ N(anı +b, (a01)), V ~ N(cuz + d, (co2)). 


Gi) Determine the joint m.g.f. of U and V. 
Gii) From parts (i) and (ii), conclude that U and V are independent. 


1.15 Let X and Y be independent r.v.'s denoting the lifetimes of two batteries 
and having the Negative Exponential distribution with parameter i. Set 


T = X + Y and: 
G) Determine the d.f. of T by integration, and then the corresponding 
p.d.f. 


(ii) Determine the p.d.f. of T by using the m.g.f.'s approach. 
Gii) For A = 1/3, calculate the probability P(T < 6). 


1.16 Let X,,..., Xn be iid. r.v.'s with m.g.f. M, and let X = LX, +--+ Xn). 
Express the m.g.f. Mẹ in terms of M. 


158 Chapter 5 Independence of Random Variables and Some Applications 


1.17 In reference to Exercise 3.1 in Chapter 4: 
(i) Calculate the Var(X + Y) and the s.d. of X + Y. 
(ii) Use the Tchebichev inequality to determine a lower bound for the 
probability: P(X + Y < 10). 


1.18 Let p be the proportion of defective computer chips in a very large lot of 
chips produced over a period of time by a certain manufacturing process. 
For i = 1,..., n, associated with the ith chip the r.v. X;, where X; = 1 
if the ¿th chip is defective, and X; = 0 otherwise. Then X4, ..., Xn are 
independent r.v.'s distributed as B(1, p), and let X = 1 (Xy +--+ Xn). 
(i) Calculate the EX and the Var (X) in terms of p and q = 1 — p. 
Gi) Use the Tchebichev inequality to determine the smallest value of n 

for which P(X — p| < 0.1,/pq) > 0.99. 


1.19 Let the independent r.v.’s Xy, ..., Xn be distributed as P(A), and set X = 
Ao ae k 
(i) Calculate the EX and the Var (X) in terms of à and n. 
Gi) Use the Tchebichev inequality to determine the smallest n, in terms 
of A and c, for which P(|X — A| < c) > 0.95, for some c > 0. 
Gii) Give the numerical value of n for ¢ = VA and c = 0.144. 


1.20 The joint distribution of the r.v.'s X and Y is given by: 
yx -1 0 1 


=-1 a B «a 
P 0 £ 
1 a E «a 


where a, $ > 0 witha + 6 = 1/4. 
(i) Derive the marginal p.d.f.'s fx and fy. 
(ii) Calculate the EX, EY, and E(XY). 
Gii) Show that Cov (X, Y) = 0. 
(iv) Show that the r.v.'s X and Y are dependent. 


Remark: Whereas independent r.v.'s are always uncorrelated, this ex- 
ercise shows that the converse need not be true. 


1.21 Refer to Exercise 1.10 and calculate the following quantities without any 
integration: E(XY ), E(XY Z), Var(X + Y), Var(X + Y + Z). 


1.22 The iid. rv.’s Xi, ..., Xn have expectation u e % and variance 0? < oo, 
and set X = 1(X1 +---+ Xn). i 
(i) Determine the EX and the Var (X) in terms of u and o. 
(ii) Use the Tchebichev inequality to determine the smallest value of n 
for which P(|X — u| < ko) is at least 0.99; take k = 1, 2, 3. 


1.23 A piece of equipment works on a battery whose lifetime is a r.v. X with 
expectation u and s.d. o. If n such batteries are used successively and 
independently of each other, denote by Xj, ..., Xn their respective life- 
times, so that X = Ll (X,+---+X,,) is the average lifetime of the batteries. 
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Use the Tchebichev inequality to determine the smallest value of n for 
which P(|X — uļ < 0.50) > 0.99. 


1.24 Let X,,..., Xn be iid. rv.’s with EX, = y e R and Var (X1) = 0? < oo, 
and set X = 1 (X1 +--+ Xn). g 
(i) Calculate the EX and the Var (X) in terms of n and o. 
Gi) Use the Tchebichev inequality in order to determine the smallest 
value of n, in terms of the positive constant c and «, so that 


P(X — u| <co)> a (0 <a <1). 


Gii) What is the numerical value of n in part (ii) if c = 0.1 anda = 
0.90, œ = 0.95, a = 0.99? 


1.25 In reference to Exercise 5.13(iii) in Chapter 4, show that the r.v.’ s U and 
V are independent if and only if o? = 0%. 


l 5.2 The Reproductive Property of Certain Distributions 


Independence plays a decisive role in the reproductive property of certain 
r.v.'s. Specifically, if X4, ..., X are r.v.'s having certain distributions, then, if 
they are also independent, it follows that the r.v. X; + - - -+ X; is of the same 
kind. This is, basically, the content of this section. The tool used in order to 
establish this assertion is the m.g.f., and the basic result employed is relation 
(8), characterizing independence of r.v.'s. The conditions of applicability of (8) 
hold in all cases considered here. 

First, we derive some general results regarding the sample mean and the 
sample variance of k r.v.'s, which will be used, in particular, in the Normal 





distribution case discussed below. To this end, for any k r.v.'s X1, ..., Xp, their 
sample mean, denoted by X , or just X, is defined by: 
ee E 
X= a y xX. (11) 
i=l 


The sample variance of the X;’s, denoted by S? or just S°, is defined by: 


S =z) Qi- EX, 
i=1 


provided the EX;’s are finite. In particular, if EX, = --- = EX; = un, say, then 
S? becomes: 
yA 
P=} i- uy. (12) 
i=l 


The r.v.’s defined by (11) and (12) are most useful when the underlying r.v.'s 
form a random sample; that is, they are i.i.d. 


PROPOSITION 2 Let X,..., Xy be iid. r.v.'s with (finite) mean ju. Then 
EX = n. Furthermore, if the X;’s also have (finite) variance o”, then Var (X) = 
= and ES? = 0°. 
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THEOREM 3 
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PROOF The first result follows from (40) in Chapter 4 by taking cı = --- = 
Ck = 1/k. The second result follows from (44) in the same chapter, by way of 
Corollary 2 to Proposition 1 here, because independence of X; and Xj, fori Æ J, 
implies p(X;, X;) = 0. In order to check the third result, observe first that: 


k k k 
Yi = WP = OXF + hu? 2 YX 
i=l i=l i=l 

so that 


k k 
EX Xi- Y = Y EX] + ku? — 2u -ku 
i=1 i 


i=1 


k 
- Ye + 2) + ku? — 2kpu* = ko?. 


i=1 


Then ES = EY? (Xi -u = iko? =07, A 


The general thrust of the following four results is to the effect that, if 
X1,..., Xy are independent and have certain distributions, then their sum X; + 
--- + Xp has a distribution of the same respective kind. The proof of this 
statement relies on relation (8), which is validated on account of (10). 





Let the r.v.’s X1, ..., Xy be independent and let X; ~ B(n;, p) (the same 
p),t=1,...,k. Then )*_, X; ~ BO *, n, p). 











PROOF By independence, relation (20) in Chapter 3, and t e 9: 


k k 
My, xO = [ [M0 =] [Oe +0 = (pe +2", 


i=l i=1 


which is the m.g.f. of BO, 7, p). Then NX, ~ BOM, Pp). A 





Let the r.v.’s Xj, ..., Xy be independent and let X; ~ P(A;),i=1,...,k. 
Then Y, ~ POO AD. 











PROOF As above, employ independence and relation (24) in Chapter 3 in 
order to obtain: 


k k k k 
Mx x,(O= [ [MO = ] [expQie! — 25) = exp ( y n) el — > 1), 
i=l 1 i=1 i=1 


i= 


which is the m.g.f. of PŒ}; A,), so that NX: ~ P(A). A 


THEOREM 4 


THEOREM 5 
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Let the rv.’s Xj,..., Xy be independent and let X; ~ N(u;, 07), i = 
oak Then Ya ~ NO a e) ln particular at py 
= uk = and oi = - - : = op = o, then X; Xi ~ N(ku, ko®). 


II 








PROOF Use independence and formula (44) in Chapter 3, for t e Xt, in order 
to obtain: 


oO: 
Myx, xO = [| [4.0 = I] exp( uit + e) 


i=l i=1 


x eof (Sn) jet] 


i=1 
which is the m.g.f. of NO, Hi, Xio, so that Yi X; ~ NO, Mi 


i 07). The special case is immediate. A 


To this theorem, there are the following two corollaries. 


COROLLARY 1 Ifthe r.v.’s X;,..., Xy are independent and distributed as 
N(u, o”), then their sample mean X ~ N(p, =), and AE ~ N(0, 1). 


PROOF Here X = Y, +---+Y;, where Y; = e, 1=1,..., k are independent 
and Y, ~ N Ce Z) by Theorem 2 in Chapter 3, applied with c = 1/k and 
d = 0. Then the conclusion follows by Theorem 4. The second conclusion is 


immediate by the part just established and Proposition 1 in Chapter 3, since 
VKX—10) — Xu A 

o y/0?/k 
COROLLARY 2  Letther.v's X,,..., Xy be independent, let X; ~ N(u;, 0), 
i = 1,...,k, and let c;,¿ = 1,...,k be constants. Then os CiXi ~ 
NO i Cili, Et cof). 


PROOF As in Corollary 1, X; ~ N(u;, 07) implies ci X; ~ N(cimi, ch07), and 
the r.v.’s ciX;, i = 1,..., k are independent. Then the conclusion follows from 
the theorem. A 





Let the r.v.’s Xj, ..., Xy be independent and let X; ~ De ee alee ice 
k 
Then 3,4 ~ gies 








PROOF  Useindependence and formula (37) in Chapter 3, fort < L to obtain: 


k k 
1 1 
Ms, x (0) = El Mx,(t) = I] a — AE = a = Qin)? 


i=1 i=1 








which is the m.g.f. of x2 yn: A 
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COROLLARY Let the r.v's Xj, ..., Xy be independent and let X; ~ N(u;, 
02), i = 1,..., k. Then X% Cae) ~ ES and, in particular, if y = --- = 


px = wand o= = = =o = =o”, then © = ~ xP, where S? is given in (12). 


PROOF The assumption X; ~ N(us, o?) implies that Zi ~ N(0, 1) by 


Proposition 1 in Chapter 3, and E =] ~ x by rooson 2 in the same 
chapter. Since independence of X;, i = 1, ..., k implies that of (Eee “Ky = 
1,..., k, the theorem applies and yields the first seer om The second: assertion 
follows from the trst bye taking wy = - -- = uk = wando, =--- =o, =o, and 
using (12) to obtain £% = Y; (2 Hy, A 


REMARK 4 From the fact that * psi ~ x? ae formula (37) in Chapter 3, we 
have E$) = k, Var (ES) = 2k, or HS? 2 and Var (S2) = 204/k. 


REMARK5 Knowing the distribution of eos X; is of considerable practical 
importance. For instance, if X; is the number of defective items among n; in 
the ith lot of certain items, i = 1,..., k, then Si X; is the total number of 
defective items in the k lots (and Theorem 2 applies). Likewise, if X; is the 
number of particles emitted by the ith radioactive source, i = 1, ..., k, then 
ES X; is the total number of particles emitted by all k radioactive sources 
(and Theorem 3 applies). Also, if X; is the rain (in inches, for example) which 
fell in the ¿th location over a specified period of time, i = 1, ..., k, then E 1x 
is the total rainfall in all of k locations under consideration over the specified 
period of time (and Theorem 4 applies). Finally, if Y; denotes the lifetime of 
the ith battery in a lot of k identical batteries, whose lifetime is assumed to 
be normally distributed, then X; = [(Y; — 11)/0]? measures a deviation from 
the mean lifetime u, and E X; is the totality of such deviations for the k 
batteries (and Theorem 5 applies). 
Here are some numerical applications. 


The defective items in two lots of sizes nı = 10 and nz = 15 occur indepen- 
dently at the rate of 6.25%. Calculate the probabilities that the total number of 
defective items: (i) Does not exceed 2; (ii) Is more than 5. 


DISCUSSION Tf X; and X: are the r.v.'s denoting the numbers of defective 
items in the two lots, then X; ~ B(10, 0.0625), X2 ~ B(15, 0.0625) and they 
are independent. Then X = X, + X2 ~ B(25, 0.0625) and therefore: (i) P(X < 
2) = 0.7968 and (ii) P(X > 5) = 1 — P(X < 5) = 0.0038 (from the Binomial 
tables). 


Five radioactive sources independently emit particles at the rate of 0.08 per 
certain time unit. What is the probability that the total number of particles 
does not exceed 3 in the time unit considered? 


5.2 The Reproductive Property of Certain Distributions 163 


DISCUSSION In obvious notation, we have here the independent r.v.’s X; 
distributed as P(0.08),7 = 1,...,5. Then X = eee ~ P(0.4), and the 
required probability is: P(X < 3) = 0.999224 (from the Poisson tables). 


The rainfall in two locations is measured (in inches over a certain time unit) 
by two independent and Normally distributed r.v.’s X, and X> as follows: X; ~ 
N(10, 9) and X> ~ N(15, 25). What is the probability that the total rainfall: (i) 
Will exceed 30 inches (which may result in flooding)? (ii) Will be less than 8 
inches (which will mean a drought)? 


DISCUSSION | If X = X,+X2, then X ~ N(25, 34), so that: (i) P(X > 30) = 
1— P(X <30) =1-P(Z< an) ~ 1— (0.86) = 1 — 0.805105 = 0.194895, 
and (ii) P(X < 8) = P(Z < a) ~ &(—2.92) = 1 — 0(2.92) = 1 — 0.99825 = 
0.00175. 

In the definition of S? by (12), we often replace u by the sample mean_X; 
this is done habitually in statistics as y is not really known. Let us denote by 
S? the resulting quantity; that is, 


ee E _ 
2= Y (GH XY, 13 
A (13) 
Then itis easy to establish the following identity: 
k k 
Y A — wy = APH- p’, (14) 
{l i=l 
or 
k? = kS? + [Vk(X — WP. (15) 
Indeed, 


k k k 
DIG Y = IG O + HK — WP = DOCG — 1? + kK po, 
i=1 i=1 i=1 


since ¡(XXIX — uw) = (Š - YX — kX) = 0. 
From (15), we have, dividing through by o?: 





= a (16) 


kS kÈ [V-m 

[E] 
Now e ~ x? and pe ~ x? (by Propositions 1 and 2 in Chapter 3) when 
therv.’s Xj, ..., Xy are independently distributed as N(u, o). Therefore, from 
(16), it appears quite feasible that us ~ x} 4. This is, indeed, the case and is 
the content of the following theorem. This theorem is presently established 
under an assumption to be justified later on (see Theorem 9 in Chapter 6). The 
assumption is this: If the r.v.’s X1, ..., Xy are independent and distributed as 
N(u, o°), then the rv.’s X and S? are independent. (The independence of X 
and $? implies then that of [440 and 5.) 
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THEOREM 6 
Let the IA De independent and distributed as N(u, o”), and 
let S? be asia by (13). Then 5 ~ x? ,. Consequently, ES? = "+o? 
A) 











PROOF Consider relation (16), take the m.g.f.'s of both sides, and use the 
corollary to Theorem 5 and the assumption of independence made previously 
in order to obtain: 


Mrs} t) = Mio? OM R-/07 ©, 
so that 
Myszjo2 (t) = Mis O/M VEE- w02 0 


or 


1/(1 — 20/12 1 


Miso = Tq anim T aten 





which is the m.g.f. of the x?_, distribution. The second assertion follows im- 
mediately from the first and formula (37) in Chapter 3. A 


This chapter is concluded with the following comment. Theorems 2-5 may 
be misleading in the sense that the sum of independent r.v.'s always has a 
distribution of the same kind as the summands. That this is definitely not 
so is illustrated by examples. For instance, if the independent r.v.’s X and Y 
are U(0, 1), then their sum X + Y is not uniform; rather, it is triangular (see 
Example 4 (continued) in Chapter 6). 


2.1 For any r.v.’s Xj, ... 


and show that: 
(1) 
n n A 
nS? = Y (X,- XP = X} - nX?. 
i=l i=l 
(ii) If the r.v.'s have common (finite) expectation u, then 
n 


NA =? IR u? =n +R- pY. 
i=1 


i=1 
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2.2 In reference to Exercise 3.1 in Chapter 4, specify the distribution of the 
sum X + Y, and write out the expression for the exact probability P(X + 
Y < 10). 


2.3 If the independent r.v.’s X and Y are distributed as B(m, p) and B(n, p), 
respectively: 
(i) What is the distribution of the rv. X + Y? 
Gi) If m = 8,n = 12, and p = 0.25, what is the numerical value of the 
probability: P(5 < X + Y < 15)? 


2.4 The independent r.v.’s Xj, ..., Xn are distributed as B(1, p), and let Sn = 
Xit- + Xn. 
(i) Determine the distribution of the rv. Sn. 
(ii) What is the EX; and the Var(X;), ¿=1,...,n? 
Gii) From part (ii) and the definition of Sn, compute the ES,, and Var(S;). 


2.5 Let X1,..., Xn bei.id. r.v.'s with p.d.f. f, and let J be an interval in K. Let 
p=P(X € I). 
G) Express p in terms of the p.d.f. f. 

Gi) For k with 1 < k < n, express the probability that at least k of 
X1,..., Xn take values in the interval J in terms of p. 

Gii) Simplify the expression in part (ii), if f is the Negative Exponential 
p.d.f. with parameter à and J = E, 00). 

(iv) Find the numerical value of the probability in part (iii) for n= 4 and 
k=2. 


2.6 The breakdown voltage of a randomly chosen diode of a certain type is 
known to be Normally distributed with mean value 40V and s.d. 1.5V. 
(i) What is the probability that the voltage of a single diode is between 
39 and 42? 
(ii) If 5 diodes are independently chosen, what is the probability that at 
least one has a voltage exceeding 42? 


2.7 Refer to Exercise 1.18 and set X = X,+---+ Xn- 
(i) Justify the statement that X ~ B(n, p). 
(ii) Suppose that nis large and p is small (both assumptions quite appro- 
priate in the framework of Exercise 1.18), so that: 


Y 

fu) = (Cra ~ PY = 0,1,... 
x xl 

If np = 2, calculate the approximate values of the probabilities f(x) 

for x = 0, 1, 2, 3, and 4. 


2.8 The r.v.'s Xj, ..., Xn are independent and X; ~ Pi): 
(i) What is the distribution of the rv. X = X; +---+ Xn? 
(ii) If X = L(X, + --- + Xn), calculate the EX and the Var(X) in terms 
of ài, ..., An, and n. 
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(iii) What do the EX and the Var(X) become when the X;’s in part (i) are 
distributed as P()? 


2.9 Suppose that the number of no-shows for a scheduled airplane flight is 
ar.v. X distributed as P(A), and it is known from past experience that, 
on the average, there are 2 no-shows. If there are 5 flights scheduled, 
compute the following probabilities for the total number of no-shows 
X= X,+---+ Xs: 


(i) 0. (v) At most 10. (ix) 15. 
(ii) At most 5. (vi) 10. (x) Atleast 15. 
(iii) 5. (vii) At least 10. 


(iv) Atleast 5. (viii) At most 15. 


2.10 The r.v.'s X;,..., Xn are independent and X; ~ P(A;),i = 1,...,n. Set 
T = $; Xi and A = J`; Ai, and show that: 
(i) The conditional p.d.f. of X;, given T = t, is Bt, A;/A),t=1,...,7 
(ii) What does the distribution in part (i) become for A, = --- = Àn = C, 
say? 


2.11 If the independent r.v.’s X and Y are distributed as N(11, of) and 
N(ua, 03), respectively: 
(i) Specify the distribution of X — Y. 
(ii) Calculate the probability P(X > Y) in terms of (1, 2, 01, and o2. 
Gii) If uı = u2, conclude that P(X > Y) = 0.5. 


2.12 The m+ nvv.’s X;,..., Xm and Y¡,..., Y, are independent and X; ~ 
N(u, 07), i= 1,...,m, Yj ~ N(ua, 02), j = 1, ..., n. Set X = LY, 
Xi, Y = 7-1 Yj and: 
(i) Calculate the probability P(X > Y) in terms of m, n, u1, la, 01, and 
02. 
(ii) Give the numerical value of the probability in part (1) when uy = ua 
unspecified. 


2.13 Let the independent rv.’s X1, ..., Xn be distributed as N(u, 0?) and set 
X=) ¡21 0X;, Y = »¡-1 PX, where the œ;’s and the £;’s are constants. 
Then: 

(i) Determine the p.d.f.'s of the r.v.’s X and Y. 
(ii) Show that the joint m.g.f. of X and Y is given by: 


1 : 
Mx yh, t2) = exp[ uit + pala + zit + 2p01021 to + 036), 


E n Ñ = n 7 a 2 n 2 2. 
where mı = U} g-i Ma = H} j- Bi Of = o") i OF = 


Na bh P = O 1 081)/0102. 
(iii) From part (ii), conclude that X and Y have the Bivariate Normal 
distribution with correlation coefficient 


p(X, Y)=p= (Les) 
{=I 
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(iv) From part (iii), conclude that X and Y are independent if and only if 
ici Bi = 0. 


2.14 Let X and Y be independent r.v.'s distributed as N(0, o°). 
(i) Set R = VX? + Y? and determine the probability: P(R < r), for 
r>0. 
Gi) What is the numerical value of P(R < r) foro = 1 andr = 1.665, r = 
2.146, r = 2.448, r = 2.716, r = 3.035, and r = 3.255? 
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Transformation 
of Random Variables 


This chapter is devoted to transforming a given set of r.v.'sto another set of r.v.'s. 
The practical need for such transformations will become apparent by means 
of concrete examples to be cited and/or discussed. The chapter consists of five 
sections. In the first section, a single r.v. is transformed into another single r.v. 
In the following section, the number of available r.v.'s is at least two, and they 
are to be transformed into another set of r.v.’s of the same or smaller number. 
Two specific applications produce two new distributions, the t-distribution 
and the F-distribution, which are of great applicability in statistics. A brief 
account of specific kinds of transformations is given in the subsequent two 
sections, and the chapter is concluded with a section on order statistics. 
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Suppose that the r.v.’s X and Y represent the temperature in a certain locality 
measured in degrees Celsius and Fahrenheit, respectively. Then it is known 
that X and Y are related as follows: Y = 2X + 32. 


This simple example illustrates the need for transforming a r.v. X into another 
r.v. Y, if Celsius degrees are to be transformed into Fahrenheit degrees. 


As another example, let the r.v. X denote the velocity of a molecule of mass 
m. Then it is known that the kinetic energy of the molecule is a r.v. Y related 
to X in the following manner: Y = ¿mX?. 


Thus, determining the distribution of the kinetic energy of the molecule in- 
volves transforming the r.v. X as indicated above. 
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The formulation of the general problem is as follows: Let X be a r.v. of the 
continuous type with p.d.f. fx, and let h be a real-valued function defined on 
R. Define the rv. Y by Y = h(X) and determine its p.d.f. fy. Under suitable 
regularity conditions, this problem can be resolved in two ways. One is to 
determine first the d.f. Fy and then obtain fy by differentiation, and the other 
is to obtain fy directly. 





Let S C HR be the set over which fy is strictly positive, let k : S > T (the 
image of S under h) C st be one-to-one (that is, to distinct x's in S there 
correspond distinct y's in T) and strictly monotone, and let Y = h(X). 
For x e S, set y= h(x) e T. Then Fy(y) = Fx[h”*(y)), if h is increasing, 
and Fy(y) = 1 — Fx[h”*(y)], if h is decreasing. 








PROOF  Inverting the function y = h(x), we get x = h”*(y). Then for increas- 
ing h (which implies increasing h~!), we have: 


Fy(y) = PY < y = PIX) < y = PIRX] < yy) 
= P[X < hy] = Fx (9). (1) 
If h is decreasing, then so is h~t and therefore: 
Fy(y) = PIRX) < y] = PIR MO] = hy) 
=P[X>hkU9]|=1-P[IX<k (y) 
= 1 — P[X < F(Y] =1-Fxlh 9D. A (2) 


As an illustration, consider the case Y = 2X +32 in Example 1 above. Here 
y=h(x)= 2a + 32 is one-to-one and strictly increasing. Therefore 


5 5 [5 
Fry = Fx 2c -3| and Fou) = 3 fx] ey 82} 8) 


The function y = h(x) may not be one-to-one and strictly increasing on the 
entire S but it is so on subsets of it. Then Fy can still be determined. Example 2 
above illustrates the point. Let Y = imX 2 as mentioned above. Then proceed 
as follows: For y > 0: 

2y 


Fy) = PW <a) = P( mx" < v) a P(x < 2) 
2 m 


NE) 
= p(x < 2) -r(x<- 2) -rh 2) px I (4) 
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Differentiating in (1) or (2) (depending on whether h is increasing or decreas- 
ing), we obtain the p.d.f. of Y, namely, 


Fry) = q YO= 





Sxlh> AS 





E z|= yeT. 
why (4Y 


(5) 


In the case of formula (3), relation (5) gives: fy(y) = 5 Sxl — 32)], as has 
already been seen. In the case of formula (4), 


ro = [ah 2) + h- 2) Jarno © 


Instead of going through the d.f. (which process requires monotonicity of 
the transformation y = h(x)), under certain conditions, fy may be obtained 
directly from fy. Such conditions are described in the following theorem. 





Let X be a r.v. with positive and continuous p.d.f. on the set SC HR, and 
let k : S > T (the image of S under h) be a one-to-one transformation, 
so that the inverse x = h7*(y), y € T, exists. Suppose that, for y e T, the 
derivative Lay) exists, is continuous, and + 0. Then the p.d.f. of 
the r.v. Y = h(X) is given by: 


Sry) = fxlh WI | Er y € T (and = 0 for y ¢ T). (7) 














PROOF (rough outline) Let B = [c, d] be an interval in T and suppose B is 
transformed into the interval A = [a, b] by the inverse transformation x = 
h- cy). Then: 


P(Y e B) = P[h(X) € B] = P(X € A) = / f(x) dx. 
A 


When transforming x into y through the transformation x = h7!(y), 


J, Dd = fg SMN a Widy, according to the theory of changing 
variables in integrals. Thus, 


d 
RO | Eh "oldu, 
B y 
which implies that the integrand is the p.d.f. of Y. A 


Relation (7) has already been illustrated by Example 1. A slightly more 
general case is the following one. 


Determine the p.d.f. of the r.v. Y defined by: Y = aX + b (a ¥ 0). In particular, 
determine fy, if X ~ N(p, o®). 
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DISCUSSION The transformation y = ax + b gives x = h! (y) = yo so 


a 


that. z = hy) = 1. Therefore: fy(y) = HER For the special case: 


a 














yb y _ 2 
eu E 2] E = (an + b)] | 


1 
exp | = 
2r lalo 20? 2r lalo 2(ao y 


Thus, if X ~ N(u, 0?), then Y = aX +b ~ N(au + b, (ac Y). 

A modification of Theorem 2 when the assumption that h : S > T is one- 
to-one is not satisfied, but a version of it is, is stated in the following result. 
This result has already been illustrated by (6) in connection with Example 2. 





Let X be a r.v. with positive and continuous p.d.f. on the set SC, and 
suppose that the transformation h : S > T is not one-to-one. Sup- 
pose further that when Sis partitioned into the pairwise disjoint subsets 
Si, - - -, Sy and his restricted to S; and takes values in T; (the image of S; 
under h), then h is one-to-one. Denoting by h; this restriction, we have 
then: h; : S; > T} is one-to-one, so that the inverse x = h; 0), YET; 
exists, J = 1, ..., Y. Finally, we suppose that, for any y € T;, j=1,..., Y, 
the derivatives Ln (y) exist, are continuous, and Æ 0. Then the p.d.f. 


J 
of the r.v. Y = n(x ) is determined as follows: Set 


fv, =Sx[h 9) > OE Ii, SL on 








d 
Lhal 
ay (Y) 


and for y € T, suppose that y belongs to k of the r T;’s, 1 < k < r. Then 
Jy(y) is the sum of the corresponding k fy,(y)'s. Alternatively, 


K=] 80,0, yeT (and=0foryg T), (8) 


j=l 
whered,(y) = lit ye Zand 6,@)—0,1f ye e Ll. 7 








REMARK 1 It isto be noticed that, whereas the subsets Sj, ..., S, are pair- 
wise disjoint, their images T}, ..., T, need not be so. For instance, in Example 2, 
Sı = (0, œ), S = (~œ, 0) but 7, = Ty = (0, 00). 


1.1 The rv. X has p.d.f. fx(1) = 1 — aja”, x = 0, 1,...(0 < a < 1), and set 
Y = X’. Determine the p.d.f. fy. 


1.2 Let the r.v.'s X and Y represent the temperature of a certain object in 
degrees Celsius and Fahrenheit, respectively. Then, itis known that Y = 
2X +32 and X = SY — 160, 
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(i) If Y ~ N(u, 0?), determine the distribution of X. 
Gi) If P(90 < Y < 95) = 0.95, then also P(a < X < b) = 0.95, for some 
a < b. Determine the numbers a and b. 
(iii) We know that: P(u — o < Y < u + o) ~ 0.6827 = pı, P(u — 20 < 
Y < u +20) ~ 0.9545 = po, and P(u-30 < Y < u +30) = 


0.9973 = p3. Calculate the intervals [az, bx], k = 1, 2, 3 for which 
P(ax < X < by) is, respectively, equal to pk, k = 1, 2, 3. 


1.3 Let the r.v. X have p.d.f. fx positive on the set S C R, and set U = aX +b, 
where a and b are constants and a > 0. 
(i) Use Theorem 2 in order to derive the p.d.f. fy. 
(ii) If X has the Negative Exponential distribution with parameter i, 
show that U has the same kind of distribution with parameter 1/a. 
(iii) If X ~ U(c, d), then show that U ~ U(ac + b, ad + b). 


1.4 If the r.v. X has the Negative Exponential distribution with parameter i, 
set Y = e* and Z = log X and determine the p.d.f.’s fy and fz. 


1.5 Let X ~ U(a, B) and set Y = e*. Then determine the p.d.f. fy. If a > 0, 
set Z = log X and determine the p.d.f. fz. 


1.6 (i) Ifthe rv. X is distributed as U (0, 1) and Y = —2 log X, show that Y is 
distributed as x. 
Gi) If X1,..., Xn is a random sample from the U(0, 1) distribution and 
Y; = —2 log X;, use part (i) and the m.g.f. approach in order to show 
that );_, Y; is distributed as x3... 


1.7 If the r.v. X has the p.d.f. fx(x) = ge, a e R, show that the rv. 
Y=1-N0, 1). 


1.8 Suppose that the velocity of a molecule of mass mis a r.v. X with p.d.f. 
fx) = [zee A, x > 0 (the so-called Maxwell distribution). De- 


rive the p.d.f. of the rv. Y = 5mX 2, which is the kinetic energy of the 
molecule. 


1.9 If the rv. X ~ N(0, 1), use Theorem 3 in order to show that the r.v. 
Y=X?~ rae 


1.10 Let X, be a r.v. distributed as t with r degrees of freedom: X, ~ t, (r = 
1, 2, ...) whose p.d.f. is given in relation (10) below. Then show that: 
(i) EX, does not exist for r = 1. 
Gi) EX, = 0 forr > 2. 
(iii) Var(X,) = 5 forr > 3. 


Hint: That EX, does not exist for r = 1 is, actually, reduced to 

Exercise 1.16 in Chapter 3. That EX, = 0 for r > 2 follows by a simple 

integration. So, all that remains to calculate is EX?. For this purpose, 

first reduce the original integral to an integral over the interval (0, oo), 

by symmetry of the region of integration and the fact that the inte- 
7 : i 2 

grand is an even function. Then, use the transformation — = x, and 
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next the transformation ms = y. Except for constants, the integral is 


then reduced to the form 


1 
[rra tay @>0,6> 9). 
At this point, use the following fact: 


1 
f ya — yf dy = 

0 (a + £) 

(A proof of this fact may be found, e.g., in pages 70-71, of the book 
A Course in Mathematical Statistics, 2nd edition (1967), Academic 
Press, by G. G. Roussas.) The proof is concluded by using the recursive 
relation of the Gamma function (T (y) = (y — IDT (y — 1)) and the fact 


that rG) = yr. 


i 6.2 Transforming Two or More Random Variables 





Often the need arises to transform two or more given r.v.'s to another set of 
r.v.'s. The following examples illustrate the point. 


The times of arrival of a bus at two successive bus stops are r.v.’s X; and X2 
distributed as U(a, £), for two time points a < £. Calculate the probabilities 
P(X, + X2 > x) for 2a < x < 28. 


Clearly, this question calls for the determination of the distribution of the r.v. 
Xı + Xo. 


Or more generally (and more realistically), suppose that a bus makes k stops 
between its depot and its terminal, and that the arrival time at the ith stop 
is a r.v. X; ~ U(«i, Bi), a < Bi, i= 1,...,k + 1 (where Xx+: is the time of 
arrival at the terminal). Determine the distribution of the duration of the trip 
Xi +- + Xen 


Consider certain events occurring in every time interval [t;, t2] (0 < t < ta) 
according to the Poisson distribution P(A(t2 — t,)). Then the waiting times 
between successive occurrences are independent r.v.'s distributed according 
to the Negative Exponential distribution with parameter à. Let X, and Xə be 
two such times. What is the probability that one would have to wait at least 
twice as long for the second occurrence than the first? That is, what is the 
probability P(X2 > 2X1)? 


Here one would have to compute the distribution of the rv. Xə — 2X. 
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Below, a brief outline of the theory underpinning the questions posed in the 
examples is presented. First, consider the case of two r.v.’s X; and Xə hav- 
ing the joint p.d.f. fx, x,. Often the question posed is that of determining the 
distribution of a function of X; and X2, h¡(X¡, X2). The general approach is 
to set Y, = h1(X,, X2) and also consider another (convenient) transformation 
Y> = h2(X1, X2). Next, determine the joint p.d.f. of Y, and Yo, fy, y,, and, finally, 
compute the (marginal) p.d.f. fy, . Conditions under which fy, y, is determined 
by way of fx, x, are given below. 





Consider the r.v.'s X; and Xə with joint p.d.f. fx, x, positive and con- 
tinuous on the set S € a, and let kı, ho be two real-valued transfor- 
mations defined on S; that is, hi, ha: S — R, and let T be the image 
of S under the transformation (hj, h2). Suppose that (hı, ha) is one-to- 
one from S onto T. Thus, if we set y =h1(%, x2) and ya = ho(%, x2), 
we can solve uniquely for 1, x2:%1 =91(Y, Y2), X2 =92(V, Y2). Sup- 
pose further that the partial derivatives gul, y2) = = gı, Ya) and 
glAn, Ya) = E Ih, Ya), i= 1, 2 exist and are continuous for (y, Ya) € 
T. Finally, suppose that the Jacobian J = a o a a is40on?T. 
Then the jointp.d.f. ofther.v.'s Y, =h1(X1, X2)and Ya = ha(X1, Xə), JY, Ya 
is given by: 





Sr, (Y, Y2) = fx, xo lg, Y2), 2M, Ya Sl, CU Ya) € Y (9) 
(and = 0 for (Y1, y2) E T). 











The justification of this theorem is entirely analogous to that of Theorem 2 
and will be omitted. 

In applying Theorem 4, one must be careful in checking that the underlying 
assumptions hold and in determining correctly the set T. As an illustration, let 
us discuss the first part of Example 4. 


(continued ) Discussion We have y; = X1 + xz and let Y = = X2. Then x 
Y — ya and x2 = ya, so that = = = le a = —l, e = = 0,2 A = 1, and J = 
i -1 il= = 1. For the determination of S and T, see Figures 6.1 and 6.2. 








Since Fx, X, v2) = gF for (%, v2) € S, we have Su, ro (Y, Y2) = ay 
for (y, Y2) € T; that is, for 2a < y < 28, a < Ya < b, A < Y — Ya < B 
(and = 0 for (Y, Y2) ¢ T). 


Thus, we get: 


ao 2a < y < 2P, a<Ya<B, a < Y — Y2 <P 
Fr, (Y) Y2) = , 
0, otherwise. 
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Figure 6.1 


S =[(x1, x2)ER?; 
Jx, x,(%1,x2) > 0} 























Figure 6.2 


T = Image of S 
Under the 
Transformation 
Used 




















Therefore: 
1 Yy—0 — n2 
(B=0Y Ja dyz = Gna? for 2a < y <a + 


= 1 B 28— 
Sr (Y) = a Sy UY? = EA, fora + B < Y < 26 


0, otherwise. 


The graph of fy, is given in Figure 6.3. 


(continued ) Discussion Here y = X2 — 24%, = —2% + xz and let yz = Xo. 
2| = —5 and |J| = 4. 
0. 


ja 





1 
Then xı = —341 + $y and a = ya, so that J = E f 
Clearly, S is the first quadrant. As for T, we have y2 = x2, so that yz > 
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Figure 6.3 


This Density Is Srn) 
Known as the 
Triangular p.d. f. 























Also, - y + Ya = %4, So that -iy + Ya > 0or —Y + Y2 > Dor ya > Y. 
The conditions y2 > 0 and yz > y% determine T (see Figure 6.4). 


Figure 6.4 





















































T is the part of the plane above the y-axis and also above the main diagonal 
Y = Y2. 

Since fx, x @, 02) = Aet) (a, X > 0), we have fy n(Y, Y2) = 
A ehh Fue, (Y, Ya) € T (and = 0 otherwise). Therefore fy, (4) is taken by 
integrating out y2. More precisely, for y < 0: 


a? 2Y bi — yo 2 Ay — yp, 
= e i e? UA xe? 
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whereas for y, > 0: 
2 ò a 
Fi (Y) = sein f e “Bdy = _ Aku x da 
2 i 3 , 
i A 
Aa 3A À 
= — ei") — eT TM) = e 
3° (0-e 2%) ze 
To summarize: 
keit, Yy < 0 
Jnn) = 
Zen, y>0. 
Therefore P(X: > 2X1) = P(X2—2X, > 0) = P(Y, > 0) = Ta endy = 1 


REMARK 2 To be sure, the preceding probability is also calculated as 
follows: 


P(X > 2X1) = J / eWay daz 


(%2>2a1) 


00 x2/2 
= / pe + f pen dx, |\ dx. 
0 0 


oo A 
= / re (1 — e 2" dxo 
0 


= 2 [3 _a, 2 
= A Ame qq = — — 73% q 9 = 1 = >= 
Í ú = =a 2° Á 3 


Applications of Theorem 4 lead to two new distributions, which are of great 
importance in statistics. They are the t-distribution and the F-distribution. 


1 
3° 


DEFINITION 1 

Let X and Y be two independent r.v.'s distributed as follows: X ~ N(0, 1) 
and Y ~ x2, and define the r.v. T by: T = X/,/Y/r. The rv. T is said to 
have the (Student's) t-distribution with r degrees of freedom (d.f.). The 
notation used is: T ~ t». 


The p.d.f. of T, fr, is given by the formula: 


r[30+D] 1 
Jarl (7/2) * [1 + (2/7]U230+D” 





Sr@ = ten, (10) 
and its graph (for r = 5) is presented in Figure 6.5. 

From formula (10), it is immediate that fr is symmetric about 0 and tends 
to 0 as t > +o. It can also be seen (see Exercise 2.10) that fr(t) tends to 
the p.d.f. of the N(0, 1) distribution as the number r of d.f. tends to oo. This is 
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Figure 6.5 


Two Curves of the t Sr® 
Probablity Density 4 
Function 





t..(NQ, 1)) 




















depicted in Figure 6.5 by means of the curve denoted by t». Also, it is seen 
(see Exercise 2.9) that ET = 0 for r > 2, and Var(T) = 5 for r > 3. Finally, 
the probabilities P(T < t), for selected values of t and r, are given by tables 
(the t-tables). For r > 91, one may use the tables for the standard Normal 
distribution. 

Regarding the derivation of fr, we have: 


1 2 
fx) = —e Owe R, 
J 27 


1 yD- eY, y s 0 


= T(4r)20/dr 
Sv) | 0, Jai 


Set U = Y and consider the transformation 








pant = +t; 
Ote) | Var ; inen [F gam 
u=y y=U, 
and 
Ju t 
Jal V7 Wa] vu 
0 4 yr 
Therefore, for t e R, u > 0, we get 
1 Eo 1 7 : yu 
t, u) = et u/2r x ul- l ¿2 T 
frot D= ar ropa vi 


1 y u “) | 
a/Dr+1)-1 
= exp|—=(1+—} |}. 
J 2111 (1/2)21/2 | 2 ( Y 
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Hence 


2 
Sr(t) = (DD exp (2 P E) Jau 
E 


oa 1 
I J 2n rT (r/2)2"/2 


Set 


2 {2 -1 2 -1 
Z + =) =z sothat u= 22( + 2) , du= 2(1 + ) dz, 
2 Y Y Y 


and z e [0, co). Therefore we continue as follows: 





oo 1 de  TUDEFDA 2 
fr) = / | 7 | at? 
o vV2rrT(r/2)21/2 1+ &/r) 1+ t/r) 
1 91/2+1) 





a (1/2)@7+1)-1 
= ZII odg 
vV2rrT(r/2)27 [1 + (2/1)]0/20+D i 


1 1 
T Jarr (7/2) [1+ (C/OH 





r[5e+ D], 


since mo D0+D-1e=7 (2 > 0) is the p.d.f. of the Gamma distribution 


with parameters œ = "2 and £ = 1; that is, 


riie +1) 1 
JATO) F ED 


Now, we proceed with the definition of the F-distribution. 


tem. 





JSr@) = 


DEFINITION 2 

Let X and Y be two independent r.v.'s distributed as follows: X ~ la 
and Y ~ x2, and define the rv. F by: F = Fa. The r.v. F is said to have 
the F-distribution with rı and ra degrees of freedom (d.f.). The notation 
often used is: F ~ Fi, 5. 





The p.d.f. of F, fr, is given by the formula: 


Pia trer fom 
SHA = rara Rama TS >0 
0, for f < 0, 





aD 


and its graphs (for 1, = 10, r2 = 4 and 7, = 72 = 10) are given in Figure 6.6. 
The probabilities P(F < f), for selected values of f and 7, r2, are given by 
tables (the F-tables). 
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Figure 6.6 


Two Curves of the F 
Probablity Density 
Function 























The derivation of frp is based on Theorem 4 and is as follows. For x and 
y > 0, we have: 


and lg x 5 0, 


Sx(x) = T(Er)2n/2 


HD y Ag, y>0, 


Jr 
We set Z = Y, and consider the transformation 


_ Un La 
(hı, h2): te ure » tnen (97 Pia 








z=y =2, 
and 
ae pT r Y 
J=|" ™ |=+z2  sothat|J] ==2. 
0 1 Ya Y 
For f, z > 0, we get: 
1 “a ale (11/2)—1 ,,(11/2)—1 2, (12 /2)1 
Z) = dE 
frz 2) T (iri) (37) 202047) (2) f 


r r 
x exp ( — 2) faethe 
2 


Y2 





r arga 2)-1 j 
A ¿DH exp [¿(Er+ 1)]. 


Tr ($71) r (4r2)20/201+) 2 (172 
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Therefore 


p= i fealh ade 


1/2 1/2)-1 
r/r yP fO l 202+] exp (Es + 1) jaz 
0 


= T (ir) r (3) 20 B04) Yo 





Set 
-1 
ar+ 1) =t, sothate= 2 r+ 1) , 
2 Ya T2 


-1 
dz = a(Zs+ 1) dt, te[0,00). 
2 
Thus continuing, we have 


inp sor 


Sef) = Tn) (gr) 2020) 





(1/2)(1+r)—1 Y1 id 
2 1+7 = f+ 1 
2 


r =l fœ 
x (2y + 1) / (UIC) tg 
Y2 0 


_ Ps +7)101/12)? a fOD- 
7 T(311)P (27) [1 + (r/r f ]U 2r 





since eae” 2yeitr)-le-t (t > 0) is the p.d.f. of the Gamma distribution 


with parameters a = “3% and £ = 1. Therefore 





Ct Suda 
A=] Tarda * Tams for f > 0 
0, forf <0. 
REMARK 3 
Gi) From the definition of the F-distribution, it follows that, if F ~ Fan, then 


~ Frer 

(ii) if T e i, then T? ~ Fır. Indeed, T = X/,/Y/7, where X and Y are 
independent, and X ~ N(0, 1), Y ~ x2. But then 7? = — = De ~ Fir, 
since X? ~ x? and X? and Y are independent. 

Gii) If F ~ Fan, then it can be shown (see Exercise 2.10) that 





2r3 (ri + 12 — 2) 
ri (1, — 2 (ra — 4) 
One can formulate a version of Theorem 4 for k(>2) r.v.’s Xj, ..., Xk, as 


well as a version of Theorem 3. In the following, such versions are formulated 
for reference purposes. 


T2 
Tz — 2 








EF = , forr > 3, and Var(F) = for ra > 5. 
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THEOREM 5 
Consider the r.v.’s Xj, ..., Xy withjointp.d.f. fx,... x, positive and contin- 
uous on the set S € RE, and let hı, ..., hy be real-valued transformations 
defined on S; that is, hı, ..., hy : S > RH, and let T be the image of S 
under the transformation (hı, ..., hg). Suppose that (hı, ..., hg) is one- 
to-one from S onto T. Thus, if we set Y; = i(%1, ..., 0%) 71 = 1,...,K, 
then we can solve uniquely for x;, i = 1, ..., k : Xi = 9i(Yi, ---, Yk), t= 
1l, ..., k. Suppose further that the partial derivatives gi;(Y1, ..., Yk) = 
ay Fis co Yk), i j = 1,...,k exist and are continuous for (4, ..., 
Yk) € T. Finally, suppose that the Jacobian 


GCA, Yi) => OK CH, aes Y) 


is 4 0 on T. Then the joint p.d.f. of the r.v.'s Y, = h¡(X1,..., Xx), i = 
ly ooog SY, neon Y, is given by: 


SY, eon Hip coo) Oe) = Ja Pees OO coo Bid) coon GH Bily conn UNA 
(Y, ---, Y; )ET (and=Ofor (Yi, ..., Y) T). (12) 











A suitable version of the previous result when the transformations 
hi, ..., hy are not one-to-one is stated below; it will be employed in 
Theorem 12 in Section 5. 





THEOREM 6 
Let X,..., Xg ber.v.’s with joint p.d.f. fx, x, positive and continuous on 
the set S c R*, and let hı, ..., hy be real-valued transformations defined 
on S; that is, hı, ..., Ry : S —> KR, and let T be the image of S under the 
transformation (hı, ..., hy). Suppose that (hj, ..., hy) is not one-to-one 
from S onto T but there is a partition of S into (pairwise disjoint) subsets 
S},..., Sy such that when (hı, ..., hy) is restricted to S; and takes values 
in 7; (the image of S; under (hy, ..., Ry), 7 = 1,...,7, then (hy, ..., hy) 
is one-to-one. Denoting by (hij, ..., xj) this restriction, we have then: 
(hij, ..., xj) : Sj —> Ty; is one-to-one, so that we can solve uniquely 
Wore a, C = loss ld = OO sr. Ui) © = Lycos lf, tor caen y) = 
1,...,7. Suppose further that the partial derivatives gju(M, ..., Yk) = 
IM, U ee eee On es andare Continuous 
for (Y, ---; Yk) E T j=1,..-,7, and the Jacobian 


Oil) PRD) 22° CimiOo o... Oty) 


i= a de 
Oj (Y, ---, Yk) ==:  9jrk(YL ---, Ye) 


18 52 OOM My Wor g = Myo oay e 
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Set 


Sy, (Y, ...3 Yk) = SX,,.... X L971 CU; ...) Yk), ...) Jik, ...) Y) Jl, 
Qin ooo Oe) E li PEL 


Then the joint p.d.f. of the r.v's Y, = hí(X,,..., Xp), 1 = 1,...,k, 
SY,,... Yi, 18 given by: 


: 
Te, ie 0D = Y 0o 2000 MA 2029 Weh Qio cc 0DE T 
p= 


(and = 0 for (Y, -.., Yk) E T), (13) 


where ô; (Yi, ..., Y) =1, if (Y, --- ye) ET; and 5;(H,..., yx)=0, if 
hip anon Ue) E Uy J = My ocagt 








2.1 The r.v.'s X and Y denote the outcomes of one independent throw of two 
fair dice, and let Z = X + Y. Determine the distribution of Z. 


2.2 Let the independent r.v.’s X and Y have the Negative Exponential distri- 
bution with à = 1, and set U = X + Y, V = X/Y. 
(i) Derive the joint p.d.f. fi y. 
(ii) Then derive the marginal p.d.f.'s fy and fy. 
(iii) Show that the r.v.’s U and V are independent. 


2.3 Let the independent r.v.'s X and Y have the Negative Exponential distri- 
bution with A = 1, and set U = $(X + Y), V = 5(X— Y). 
(i) Show that the joint p.d.f. of the r.v.'s U and V is given by: 


fov(uv)=2e°, —u<vu<u u>0. 
(ii) Also, show that the marginal p.d.f.’s fy and fy are given by: 
ful) = 4ue”", u>0; fr(uy=e", for v>0, 


fv) =e”, for v <0. 


2.4 Let the independent r.v.’s X and Y have the joint p.d.f. fx y positive on a 
set S, subset of R?, and set U = aX +b, V = cY +d, where a, b, c, and 
d are constants with ac Æ 0. 
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(i) Use Theorem 4 in order to show that the joint p.d.f. of U and V is 
given by: 





1 u—b v-ce 
Ju,v u, v) = sx ; ) 
|ac| a d 


= Ga) E) (u, vy eT, 
lac| C d 


the image of S under the transformations u = ax + b, v = cy + d. 

(i) If X ~ N(uy, of) and Y ~ N(ua, 0%), show that U and V are inde- 
pendently distributed as N(ay + b, (ací)) and N(cua + b, (co#)), 
respectively. 





2.5 If the independent r.v.’s X and Y are distributed as N(0, 1), set U = 
X+Y V =X- Y, and: 
(i) Determine the p.d.f.’s of U and V. 
Gi) Show that U and V are independent. 
Gii) Compute the probability P(U < 0, V > 0). 


2.6 Let X and Y be independent r.v.'s distributed as N (0, 1), and set 


1 1 
NT AS A Gules 


(i) Determine the joint p.d.f. of U and V. 

Gi) From the joint p.d.f. fy y, infer fy and fy without integration. 
(iii) Conclude that U and V are also independent. 
(iv) How else could you arrive at the p.d.f.’s fy and fy? 


2.7 Let X and Y be independent r.v.'s distributed as N(0, o°). Then show 
that the r.v. U = X? + Y? has the Negative Exponential distribution with 
parameter à = 1/20°. 


2.8 The independent r.v.’s X and Y have p.d.f. given by: fx yŒ, y) = H, for 
x, y € R with x? + y? < 1, and let Z? = X? + Y?. Use polar coordinates 
to determine the p.d.f. fz. 


Hint: Let Z = +V Z2 and set X = ZcosO, Y = ZsinO, where 
Z > Oand 0 < O < 2x. First, determine the joint p.d.f. fz and then 
the marginal p.d.f. fz. Finally, by means of fz and the transformation 
U = Z?, determine the p.d.f. fy = fz. 


2.9 If the r.v. X, ~ t, then the t-tables (at least the ones in this book) do 
not give probabilities for r > 90. For such values, we can use instead the 
Normal tables. The reason for this is that the p.d.f. of X, converges to 
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the p.d.f. of the N(0, 1) distribution as r > oo. More precisely, 


r (34) > 1 1 
VETA (ipe OTAN 


Jx O= e "2 (t> 0). 








Hint: In proving this convergence, first observe that 


£2 (r+D/2 BN" 1/2 2 1/2 ay 
qa =|[14+-= xf{l+—) xe”, 
Y r Y ed 


and then show that 
r (=) 1 
O 


by utilizing the Stirling formula. This formula states that: 


bo 





rn) 


>l asn—> oo. 
[2 nEn—-D/2g-n 





2.10 Let X, n» be ar.v. having the F-distribution with parameters rı and 79; i.e., 
Xr ~ Fy. Then show that: 





r: 2r%(11 + % — 2) 
EXnm =p REB Var) = = zu oa. ES 


Hint: Start out with the kth moment EX n use first the transforma- 
tion 2 f = x, and second the transformation oa = y. Then observe 
that the integrand is the p.d.f. of a Gamma distribution (except for suit- 
able constants). Thus, the EX? is expressed in terms of the Gamma 


function without carrying out any integrations. Specifically, we find: 


ane 7 Yo k T (242) r (2%) 
11,72 T 5 ri Ya > 
rı r(3)r (3) 
Applying this formula for k = 1 (which requires that 72 > 3), and k = 2 


(which requires that 72 > 5), and using the recursive property of the 
Gamma function, we determine the required expressions. 








Ya > 2k. 


| 6.3 Linear Transformations 


In this section, a brief discussion is presented for a specific kind of transfor- 
mation, linear transformations. The basic concepts and results used here can 
be found in any textbook on linear algebra. 
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DEFINITION 3 
Suppose the variables x;,..., x, are transformed into the variables 
Yi, ---, Yk in the following manner: 


k 
u=} ju i=1,...,k, (14) 
j=l 


where the c;;'s are real constants. Such a transformation is called a linear 
transformation (all the x;'s enter into the transformation in a linear way, 
in the first power). 

Some terminology and elementary facts from matrix algebra will be 
used here. Denote by C the k x k matrix of the cij, i, 7 = 1,..., k con- 
stants; that is, C = (cj), and by |C| or A its determinant. Then it is well 
known that if A 4 0, one can uniquely solve for x; in (14): 


k 
j=l 


for suitable constants d;;. Denote by D the k x k matrix of the dj;’s and by 
A* its determinant: D = (dij), A* = |D|. Then it is known that A* = 1/A. 
Among the linear transformations, a specific class is of special impor- 
tance; it is the class of orthogonal transformations. 

A linear transformation is said to be orthogonal, if 


k k 
Seat and Seg 0, t =1,...,k, 121, 
j=l j=l 


or, equivalently, 
k k 
Seal md Y cyay=0, ¿SL ES, O 
i=l i=l 
Relations (16) simply state that the row (column) vectors of the matrix 
C have norm (length) 1, and any two of them are perpendicular. The 
matrix C itself is also called orthogonal. For an orthogonal matrix C, it 
is known that |C| = +1. Also, in the case of an orthogonal matrix C, 
it happens that di; = Cj, i, j = 1,..., k; or in matrix notation: D = C’, 
where C’ is the transpose of C (the rows of C’ are the same as the columns 
of C). Thus, in this case: 





k 
xi = > ony, — l; REET ló (17) 
j=l 


Also, under orthogonality, the vectors of the x;’s and of the y;’s have 
the same norm. To put it differently: 
k k 


ye = > Y (18) 


i=1 j=l 


Some of these concepts and results are now to be used in connection with 
rv.’s. 


THEOREM 7 


THEOREM 8 
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Suppose the r.v.'s X¡,..., X; are transformed into the rv.’s Y;,..., 
Y, through a linear transformation with the matrix C = (cij) and |C| = 
A Æ 0. Let S C R be the set over which the joint p.d.f. of X,,..., Xz, 
JX, X, 18 positive, and let T be the image of S under the linear transfor- 
mation. Then: 


Brees 


for (m,.--, Yk) € T (and = 0 otherwise), where the dj;’s are as in 
(15). 


(ii) In particular, if C is orthogonal, then: 


k k 
Fr... Y (Yo >>> Ye) = SX,,... so CjLYj» -- -> Yenu), (20) 
= jal 


j 
for (Y, ..., Yk) € T (and = 0 otherwise); also, 


k k 
ba) Bee (21) 
j=l =I 








PROOF 


(i) Relation (19) follows from Theorem 5. 
(ii) Relation (20) follows from (19) and (17), and (21) is a restatement of 
(18). A 


Next, we specialize this result to the case that the r.v.'s Xy,..., Xx are 
Normally distributed and independent. 





Let the independent r.v.’s Xj, ..., Xy be distributed as follows: X; ~ N(u;, 
a), i=1,..., k, and suppose they are transformed into the rv.’s Yi, ..., 
Y, by means of an orthogonal transformation C. Then the r.v.’s Yj, ..., Yy 
are also independent and Normally distributed as follows: 


k 
a Pomo) i=1,...,k. (22) 


yan 








PROOF From the transformations Y, = > cy Xj, itis immediate that each 
Y; is Normally distributed with mean EY, = Ys cy; and variance Var(Y;) = 


Pa cho? =o? Ya c} = 0?. So the only thing to be justified is the assertion 
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of independence. From the Normality assumption on the X¿'s, we have: 


Then, since C is orthogonal, (20) applies and gives, by means of (23): 


1 k y + k 2 
Fa... ni. D= (=>) 0-75 (Xem) | (24) 


Thus, the proof is completed by establishing the following algebraic relation: 


k k 2 k k 2 
> ( CjiYj — «) = y (» = Yem) (25) 
j=1 


i=l \ j= i=l j=l 


(see Exercise 3.1). A 


Finally, suppose the orthogonal matrix C in Theorem 8 is chosen to be as 
follows: 








1/Vk 1, ee beh. aes 1/Vk 
1/V2x1 -1//2x«1 0 ee aes 0 
C= 1/43x2  1/43x2 -2/43x2 Ü ...... 0 
1/VkKE—1) 1/VKk=D  ...... o. YIKE=D -—(k-1)/vk =D 





That is, the elements of C are given by the expressions: 
caj=1WVk, j=1,...,k, 


Ci =l; yi =I), for i=2,...,k and j=1,...,7—-1, 
and 0 for 7=i+1,...,k, 
cy = —(i — 1)/ vii — 1), VR ari 0: 
From these expressions, it readily follows that y Ci = 1 for all i = 
1,...,k, and Y, cycv; = 0 for alli,’ = 1,...,k, with i 4 1, so that C is, 
indeed, orthogonal (see also Exercise 3.2). Next, let Z4, ..., Zk be independent 


r.v's distributed as N(0, 1), and transform them into the r.v.’s Y1, ..., Yy by 
means of C; that is, 


1 1 1 
Y, = — Z + — 2 +- + —Z 
! VE Vk : Vk i 
1 1 


Y, = Z Z 
ESA Axi 
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1 1 2 
Y, = Z+ Z Z 
12 1x2 | Bx V3x2 > 


1 1 1 k-1 
= VE — po VEE — po~ Ji — pé 1 VEE — jé 











Then, by Theorem 8, the r.v.'s Yj, ..., Y; are independently distributed as 
N(0, 1), whereas by (21) 


k k 
Ye) Zz. 
j=l i=1 
However, Y; = Vk Z, so that 
k k k , R k N 
y OS Yi =X Z- kZ’ = Z -k = 9 (2 - ZY. 
j= j= i=l i=l ¿=1 


On the other hand, ys Y; and Yı are independent; equivalently, 


pe 1 (Zi Z} and kZ are independent or 


Z and Y, — Zy are independent. (26) 
i=l 
This last conclusion is now applied as follows. 





THEOREM 9 
Let X1, ..., Xk be Te OUI r.v.'s distributed as N(u, 97). Then the 
sample mean X = ; 1 5~"_, X; and the sample variance S? = a DD O 
X) are independent. 








PROOF The assumption that X; ~ N(u, o?) implies that Ea ~ N(0, 1). 
By setting Z; = (X; — )/o, i = 1,...,k, the Z;'s are as in the preceding 
derivations and therefore (26) applies. Since 


> LES Ls 
2= > )- a u), and 


i=1 








k k = - A2 k 
Da (E, u = £) => La- HF 


i=1 


A follows that + i- u) and 4 a D (X; — X are independent or that X and 
z LN ee xy are independent. A 
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3.1 Establish relation (25) in the proof of Theorem 8. 


Hint: Expand the left-hand side and the right-hand side in (25), use 
orthogonality, and show that the common value of both sides is: 


k 
2 4 + pr ¡> ¿YY e 
j=l j=l i=l 
3.2 Show that the matrix with row elements given by: 
cj=1/Vk, j=1,...,k, 
cy =1/ViGi-D, i=2,...,kandj=1,...,i-1, 
and 0 for j = i+ 1,..., k, 
ca = —-G—D/JiG+ 1D, i=2,...,kis orthogonal. 
3.3 Let X1, X2, Xs be independent r.v.’s such that X; ~ N(u;, 0%), i= 1,2,3, 





and set 
Y, = GA + a 
Ya = ahi - eho t Bh, 
Ya = aki + Ghat EM 
Then: 


(i) Show that the r.v.'s Yi, Y2, Y3 are independent Normally distributed 
with variance 0? and respective means: 


1 1 
EY, = Jon + u2), EY, = Gon — u2 + u3), 


1 
EY; = RE + u2 + 23). 


Gi) If wy = u2 = u3 = 0, then show that Lr? +Y; + ¥2) ~ xó, 


Hint: For part (i), prove that the transformation employed is or- 
thogonal and then use Theorem 8 to conclude independence of 
Yı, Yo, Yz. That the means and the variance are as described follows 
either from Theorem 8 or directly. Part (ii) follows from part (i) and 
the assumption that uu; = ua = u3 = 0. 


3.4 Ifthe rv. 2 X and Y have the Bivariate Normal distribution with parameters 
Ha, H2, 07, 03, and p, then the r.v.’s U = ES, V= le have the Bivariate 
Normal distribution with parameters 0, 0, 1, 1, and p; and vice versa. 
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3.5 If the r.v.'s X and Y have the Bivariate Normal distribution with parame- 
ters 0, 0, 1, 1, and p, then the r.v.'s cX and dY have the Bivariate Normal 
distribution with parameters 0, 0, c?, d?, and po, where po = 1 if cd > 0, 
and po = —1if cd < 0; c and d are constants with cd + 0. 


3.6 Let the r.v.'s X and Y have the Bivariate Normal distribution with parame- 
ters 0, 0, 1, 1, and p, and set: U = X + Y, V = X — Y. Then show that: 
(i) The r.v.'s U and V also have the Bivariate Normal distribution with 
parameters 0, 0, 211 + p), 211 — p), and 0. 
(ii) From part (i), conclude that the r.v.'s U and V are independent. 
(iii) From part (i), also conclude that: U ~ N(0, 211 + p)),V ~ N(0, 
2(1 — p)). 


3.7 Let the r.v.'s X and Y have the Bivariate Normal distribution with parame- 
ters 11, H2, 07, 03, and p, and set: 

x= pi Y — u2 

a? o ` 


U 





Then: 

G) Determine the joint distribution of the r.v.'s U and V. 

(ii) Show that U + V and U — V have the Bivariate Normal distribution 
with parameters 0, 0, 2(1 + p), 21 — p), and 0 and are independent. 
Also, U + V ~ N(0, 2(1 + p)), U — V ~ NỌ, 211 — p)). 

(iii) For of = of = o°, say, conclude that the r.v.'s X + Y and X — Y are 
independent. 


REMARK 4 Actually, the converse of part (iii) is also true; namely, if 
X and Y have the Bivariate Normal distribution N(j11, 42, 07, 0%, p), then 
independence of X + Y and X — Y implies of = 0%. The justification of this 
statement is easier by means of m.g.f.’s, and it was, actually, discussed in 
Exercise 5.13 of Chapter 4. 


3.8 Let the independent r.v.’s X1, ..., Xn be distributed as N(u, 0?) and sup- 
pose that u = ko (k > 0). Set 


z 12 1 Z _ 
X=) xX, S = 4-2. 
Wz are =I 


Then: 
(i) Determine an expression for the probability: 


Pian <X<bu, 0< S < co?) 


where a, b, and c are constants, a < b and c > 0. 
Gi) Give the numerical value of the probability in part (i) if a = 5, b = 
3, c = 1.487, k = 1.5, and n = 16. 
Hint: Use independence of X and S? provided by Theorem 9. Also, use 
the fact that DT ~ x? | by Theorem 6 in Chapter 5 (where S? is 
denoted by 5°). 
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i 6.4 The Probability Integral Transform 


THEOREM 10 


THEOREM 11 


In this short section, avery special type of transformation is considered, the so- 
called probability integral transform. By means of this transformation, two 
results are derived. Roughly, these results state that, if X ~ F and Y = F(X), 
then, somewhat surprisingly, Y is always distributed as U(0, 1). Furthermore, 
for a given d.f. F, there is always a r.v. X ~ F; this rv. is given by X = F~!(Y), 
where Y ~ U(0, 1) and F~! is the inverse function of F. To facilitate the 
derivations, F will be assumed to be (strictly) increasing. 





For an increasing d.f. F, let X ~ F and set Y = F(X). Then Y ~ U(O0, 1). 











PROOF Since 0 < F(X) < 1, it suffices to consider y e [0, 1]. Then 
PY < y) = PIF(X) < y = PIF [FOO] < F (y) 
= P[X<F YW) = FIF) = y, 
so that Y ~ U(0,1). A 





Let F be a given increasing d.f., and let the r.v. Y ~ U (0, 1). Define the 
rv. X by: X = F (Y). Then X ~ F. 





PROOF Forxeñh, 
PX < x) = PIF <a] = P{F[F"®)] < F@) 
= P[Y < F@)] = F(o), 
as was to be seen. A 


In the form of a verification of Theorems 10 and 11, consider the following 
simple examples. 


Let the r.v. X have the Negative Exponential distribution with parameter i. 
Then, for x > 0, F(x) = 1 — e™™. Let Y be defined by: Y = 1 — e~**. Then Y 
should be ~ U (0, 1). 
DISCUSSION Indeed, for 0 < y< 1, 

PW < y) = PQ- e™>*¥ < y) = P(e™*¥ > 1- y) = P[—AX E log(l = y)] 


= Pix < -1 log — | 


| 6.5 Order Statistics 
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1 
=1 exp {( a| z 080 v|] 


=]= expllog(l — y)]=1= (= y)=y, 


as was to be seen. 


Let F be the d.f. of the Negative Exponential distribution with parameter 
à, so that F(x) = 1 — e™, x > 0. Let y = 1 — e” and solve for x to 
obtain x = -! log(1 — y), 0 < y < 1. Let Y ~ U(O, 1) and define the r.v. X by: 
X= -i log(1 — Y). Then X should be ~ F. 


DISCUSSION Indeed, 


P(X < £) = P| -5 log(1 — Y) < z| = P[log(1 — Y) > —2a] 


=P(1-Y>eM)=PV<1-e*M=1-e”, 


as was to be seen. 


4.1 (i) Let X be a r.v. with continuous and (strictly) increasing d.f. F, and 
define the r.v. Y by Y = F(X). Then use Theorem 2 in order to show 
that Z = —2log(1 — Y) ~ x3. 

(ii) 56X,,..., X, is a random sample with d.f. F as described in part (i) and 
if Y, = F(X), i=1,..., n, then show that the r.v. U = Y% y Zi ~ x3, 
where Z; = —2log(1 — Y), i=1,...,n. 


Hint: For part (i), use Theorem 10, according to which Y ~ U(0, 1). 


In this section, an unconventional kind of transformation is considered, which, 
when applied to r.v.’s, leads to the so-called order statistics. For the definition 
of the transformation, consider n distinct numbers z, ..., &n and order them 
in ascending order. Denote by xq) the smallest number: xa) = smallest of 
Li, ---, Ln; by Xe the second smallest, and so on until x is the nth smallest 
or, equivalently, the largest of the x;'s. In a summary form, we write: X) = 
the jth smallest of the numbers %, ..., £n, where j = 1, ..., n. Then, clearly, 
La < Ve) < < Xq) For simplicity, set y; = xp, J = 1,..., n, SO that 
again yı < Y2 < --- < Yn. The transformation under consideration is the one 
which transforms the 4;’s into the y,’s in the way just described. 
This transformation now applies to n r.v.'s as follows. 
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Let X1, X2, ..., Xn be iid. r.v.'s with d.f. F. The jth order statistic of 
X1, X2,..., Xn is denoted by Xj, or Y; for easier writing, and is defined as 
follows: 


Y; = jth smallest of the X1, X2,..., Xm j=Hl,...,% 


(that is, for each s € S, look at X¡(s), X2(s), ..., Xn(s), and then Y;(s) is de- 
fined to be the jth smallest among the numbers X¡(s), X2(s),..., Xn(s), j = 
1, 2,..., n). It follows that Y, < Y> <--- < Yn, and, in general, the Y;’s are not 
independent. 

We assume now that the X;’s are of the continuous type with p.d.f. f such 
that f(x) > 0, (—co<)a < x < b(<oo) and zero otherwise. One of the problems 
we are concerned with is that of finding the joint p.d.f. of the Y;’s. By means 
of Theorem 6, it will be established that: 





If Xi, ..., Xp are iid. r.v.'s with p.d.f. f which is positive fora < x < b 
and 0 otherwise, then the joint p.d.f. of the order statistics Y,,..., Y, is 
given by: 


nf): FUN), A<Y <Y2<---<Ynr<b 
GCA, --., Yn) = (27) 


0, otherwise. 








PROOF The proof is carried out explicitly for n = 3, but it is easily seen, 
with the proper change in notation, to be valid in the general case as well. In 
the first place, since for i Æ j, 


b px 
e=xp=//  seoseparar= | | seosepanaa; =0, 


and therefore P(X; = X; = Xy) = 0 fori # j # k, we may assume that 
the joint p.d.f., fC,- -), of X1, X2, X3 is zero, if at least two of the arguments 
£i, %2, X3 are equal. Thus, we have: 


San Sf (2) f (3), Q < tı £ X2 $ X3 <b 
0 


IC, 2, U3) = | ; otherwise. 
Therefore f(x, X2, x3) is positive on the set S, where 

S=([(x, 22,03) ER a<aj<b, 1=1,2,3, 2, x2, x3 all different}. 
Let Six C S be defined by: 

Sijk = [(%1, X2, X3); A< G< Xj <i <b} Ljk=1,2,3, tAIFK. 


Then we have that these six events are pairwise disjoint and (essentially) 


S = S123 U S132 U S213 U S231 U S312 U S321. 
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Now on each one of the S;jx’s there exists a one-to-one transformation from 
the x;'s to the y;'s defined as follows: 


S123 : Y = M1, Y2 = L2, Y3 = L3 


S132 : Y = M1, Y2 = L3, Y3 = L2 





S213 : Yı = L2, Yo = X1, Y3 = L3 





So31 : Yı = L2, Yo = La, YZ = Y 





S312 : Yı = X3, Yo = X1, Y3 = L2 





S321 : Yı = Ya, Y2 = La, Y3 = Y. 


Solving for the x;’s, we have then: 





S123 : X1 = YA, X2 = Y2, X3 = YB 





S132 : X1 = YA, X2 = Y3, V3 = Y2 





S213 : X1 = Y2, X2 = Y, X3 = YB 





S231 : X1 = Y3, L2 = Y, V3 = Y2 





S312 : X1 = Y2, L2 = Ya, L3 = Y 





S321 : X1 = Y3, L2 = Y2, 13 = Yi. 


The Jacobians are thus given by: 














1 0 0 0 1 

S123: J23 =]0 1 0=1, S231: 31 = |1 0 0=1, 
0 0 1 0 1 0 
1 0 0 0 1 0 

Sis: Jiz2=|0 0 1=-1, Sse: di2=|0 0 1) =1, 
0 1 0 1 0 0 
0 1 0 0 0 1 

S213 : J13 =/1 0 0l = —l, S321 ' J321 =|/0 1 0)=-1 
0 0 1 1 0 

Hence |Ji23| = --- = |J321| = 1, and Theorem 6 gives 


SYDS YDS Y) + Ft YS Y2) + SY UDS Ya) 
+HfWISWYISY + SYDS YIS Y) + SUDI UDS Y), 
IM, Y2, Ya) = a < y < Yz < Yz <b 


0, otherwise. 
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That is, 


If WS Y2) f Ya), A < Y < Y2 < Y3 <b 
IMA; Y2, Y3) = . 
; otherwise. A 


Notice that the proof in the general case is exactly the same. One has n! 
regions forming S, one for each permutation of the integers 1 through n. From 
the definition of a determinant and the fact that each row and column contains 
exactly one 1 and the rest all 0, it follows that the n! Jacobians are either 1 or —1 
and the remaining part of the proof is identical to the one just given except 
that one adds up n! like terms instead of 3!. 

The theorem is illustrated by the following two examples. 


Let X,,..., Xn be iid. r.v's distributed as N(u, o7). Then the joint p.d.f. of the 
order statistics Y, ..., Y, is given by 


= 1 i 1 = 2 
IM; +++) Yn) = (+3) exp a 20 — 1) | ; 


if —o00 < Y < --- < Yn < &, and zero otherwise. 


Let X,,..., Xn be iid. r.v.'s distributed as U(a, 6). Then the joint p.d.f. of the 
order statistics Y}, ..., Y, is given by 


n 
IU, ---, Yn) = G-a 


ifa < Y < ---< Yn < , and zero otherwise. 
From the joint p.d.f. in (27), it is relatively easy to derive the p.d.f. of Y; for 


any j, as well as the joint p.d.f. of Y; and Y; for any 1 < i < j < n. We restrict 
ourselves to the derivation of the distributions of Y, and Y,, alone. 





Let Xj, ..., Xn be iid. r.v's with d.f. F and p.d.f. f which is positive 
and continuous for (—co<)a < x < b(<co) and zero otherwise, and let 
Y,,..., Yn be the order statistics. Then the p.d.f.'s g; and gn of Yı and Y,, 
respectively, are given by: 


1- Fm)? 
HOE f AI a (28) 
, otherwise, 
and 
— falF Gn)" S$ Gn), @< Yn <b 
In Yn) = t otherwise. Ce) 
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PROOF First, derive the d.f.'s involved and then differentiate them to obtain 
the respective p.d.f.'s. To this end, 


Gn(Yn) = PY, < Yn) = Plmax(X1, ..., Xn) < Yn] 
= P(all X1,..., Xn < Yn) = P(X < Yn, ---, Xn < Yn) 
= P(X < Yn)--:P(Xn < Yn) (by the independence of the X;’s) 
= [Fm]. 


That is, Gn(Yn) = [F(Yn)]", so that 


d d 
In(Yn) = Ty CO = NEY] -—FYn) = MEYI fUn). 
Yn dYn 


Likewise, 
1- Gim) = PY: > Y) = Plmin(X;, ..., Xn) > wi] 
= Pall X1,..., Xn > YH) = PX: > Y, ---, Xn > Y) 
= P(X, > y): PX, > Y) (by the independence of the X;'s) 
= [1- P(X < y): [1- PA < w= - FI". 


That is, 1 — Gi(y,) = [1 — F(m)]", so that 
d d 
—n(Y) = Tu | — 6190) = nfl — FUD — FG] 
Y dy 


= nl — Fl" SW = l Fl" SW, 


and hence 
nw) =nl-FAl "fm. A 


As an illustration of the theorem, consider the following example. 


Let the independent r.v.’s X¡,..., Xn be distributed as U(0, 1). Then, for 
O< A, <l: 


MUD =A- y and guy) = ny. 


DISCUSSION Here, for0 < x < 1, f(x) = 1 and F(x) = x. Therefore 
relations (28) and (29) give, for 0 < y, Yn < 1: 


ay)=a1-y4)*.1=n(1-y4)"* and gy) = ny", 


as asserted. 
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As a further illustration of the theorem, consider the following example, 
which is of interest in its own right. 


If Xj, ..., Xn are independent r.v.'s having the Negative Exponential distribu- 
tion with parameter i, then Y, has also the Negative Exponential distribution 
with parameter ni. 


DISCUSSION Here f(x) = de~*” and F(x) = 1 — e™> for x > 0. Then, 
for y > 0, formula (28) yields: 


gl) = e MY x Ae = (majo Me™ = (Mae, 
as was to be seen. 


(i) In a complex system, n identical components are connected serially, so 
that the system works, if and only if all n components function. If the 
lifetime of said components is described by ar.v. X with d.f. F and p.d.f. 
Jf, write out the expression for the probability that the system functions 
for at least t time units. 

(ii) Do the same as in part (i), if the components are connected in parallel, 
so that the system functions, if and only if at least one of the components 
works. 

(iii) Simplify the expressions in parts (i) and (ii), if f is the Negative Expo- 
nential with parameter i. 


DISCUSSION 
(i) Clearly, P(system works for at least ¢ time units) 
= P(X, >t,..., Xn >t) (where X; is the lifetime of the 
ith component) 
= P(Y, = t) (where Y; is the smallest order statistic) 
CO 
= / gily) dy (where g; is the p.d.f. of Y1) 
t 
00 
= / nil- FUI fu dy (by 28). (30) 
t 
(ii) Here 


P system works for at least t time units) 
= P(at least one of Xj, ..., Xn > t) 
= P(Y =t) (where Y, is the largest order statistic) 


CO 
= / gn Y) dy (where gn is the p.d.f. of Y,,) 
t 


= i MFS dy (by CD). (8D) 
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(iii) Here F(y) = 1 — e and f(y) = 2e * (y > 0) from Example 11. Also, 
from the same example, the p.d.f. of Y, is gı (Y) = (Me, so that (30) 


gives: 
00 
pm20= | edy 
t 
00 
=— f de "YY dy 
t 
= —e (mou) = er 
and, by (31), 


00 
P(Y, >t) = 1 n — e MT dy. 
t 
For example, for n = 2, this last probability is equal to: 


00 00 00 
f 2(1 -—e Mae dy = 2 | 1e Y dy — i 21e Y dy 


t t t 


00 00 
= -2 / de Y + i dey 
t t 


=-2 MP4 e” 


= 27% _ em 





5.1 Let X;,..., Xn be independent rv.’s with p.d.f. f(x) = ca “+D yx > 
1(c>0) andsetU = Y; =min(X;,..., Xn), V = Yn = max (Xj, ..., Xn). 
(i) Determine the d.f. F corresponding to the p.d.f. f. 
(ii) Use Theorem 13 to determine the p.d.f.'s fy and fy. 


5.2 Refer to Example 10 and calculate the expectations EY, and EY,, and 
also determine the lim EY,, as n > oo. 


5.3 Let Y; and Y, be the smallest and the largest order statistics based on a 
random sample Xj, ..., Xn from the U(a, 8) (a < £) distribution. 
(i) For n= 3 and n = 4, show that the joint p.d.f. of Y, and Y, is given, 
respectively, by: 


3x2 
gis(Y, Y3) = 7 (43 Yi), 0 <Y < Y < B, 
(8 — a) 
4x3 
gun, Ya) = WM -Y)Y, a< < y < B. 
(B — a) 
(ii) Generalize the preceding results and show that: 
nn-— 1 
Din, Yn) = aa, y), A< A< h < B. 


(8 — a)" 
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Hint: For part (ii), all one has to do is to calculate the integrals: 


Y PY wre 
f i oe / / dyady3 - - - AYn_2dYn-—1, 
n Y Y “Y 


which is done one at a time; also, observe the pattern emerging. 


5.4 Let Yı and Y, be the smallest and the largest order statistics based on a 
random sample X;,..., Xn from the U(0, 1) distribution. Then show that: 


1 
Cov(%, Yn) = + 
ov, Yu) = Get DD 
Hint: Use the joint p.d.f. taken from Exercise 5.3(1i) for « = 0 and 
B=1. 


5.5 If Y, and Y, are the smallest and the largest order statistics based on a 
random sample X;,..., Xn from the U(0, 1) distribution: 
(i) Show that the p.d.f. of the sample range R = Y, — Y; is given by: 


ft) =nn—- 1r a-r), 0<r<l. 
(ii) Also, calculate the expectation ER. 


5.6 Refer to Example 11 and set Z = nY;. Then show that Z is distributed as 
the X;’s. 


5.7 The lifetimes of two batteries are independent r.v.’s X and Y with the 
Negative Exponential distribution with parameter à. Suppose that the 
two batteries are connected serially, so that the system works if and only 
if both work. 

(i) Use Example 11 (with n = 2) to calculate the probability that the 
system works beyond time ¢ > 0. 
(ii) What is the expected lifetime of the system? 
(iii) What do parts (i) and (ii) become for A = 1/3? 


5.8 Let Y, and Y, be the smallest and the largest order statistics based on a 
random sample Xj, ..., Xn from the Negative Exponential distribution 
with parameter A. Then, by Example 11, 9,;(y) = mae “MY, y, > 0. 
(i) Use relation (29) (with a = 0 and b = oo) to determine the p.d.f. gn 

of the rv. Yn. 
(ii) Calculate the EY, for n = 2 and n= 3. 


5.9 (i) Refer to Exercise 5.8(1) and show that: 


Eo) 


(n— 1? 





n n—1 
EY, = 5 yeep 
r=0 


(ii) Apply part (i) for n = 2 and n = 3 to recover the values found in 
Exercise 5.8 (ii). 
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Hint: Consider the binomial expansion: (a+b) = Y_, €) ab" 
and apply it to: (1 — e! fora = 1, b=-—e Y andk=n-1. 
Then carry out the multiplications indicated and integrate term by 


term. 


5.10 Let X;,..., Xn be a random sample of size n of the continuous type with 
d.f. F and p.d.f. f, positive in —oo <a < x < b < oo, and let Y; and Y, be 
the smallest and the largest order statistics of the X;’s. Use relation (27) 
in order to show that the joint p.d.f. gin of the r.v.'s Y, and Y,, is given by 
the expression: 


9%) Yn) = NN — DIF On) — Fol"? fWFGn, a< < Yn <b. 


Hint: The p.d.f. gın is obtained by integrating g(m, ..., Yn) in (27) 
with respect to Yn-1, Yn-2, -- -, Ya aS indicated below: 


Yn Yn Yn 
Imis Y) = MSS a) / 5 / SIn- DS a) X 
Y Yn-3 Yn-2 


++ f£(Y2)dyn-1dYn—2 - - dYa. 
However, 


. Fn) — Fm 
f Yn-DdYn-1 = F(Yn) — F(Yn-2) = [F (yn) (Yn-2)] 








ha 1! , 
h TF (yn) — F(Yn_ 1 

/ [F(Yn) - (Yn-2)] e 

Yn-3 s 





h [FE (Yu) — F(Yn-2)1* 
fo Pad — Foe ary) 


! 
des 1! 


LF Cyn) — Fyn- |" PY) — Fyn)” 


2! e 21 


—F(Yn-2)] = 








and continuing on like this, we finally get: 














M LF Yn) — FUD? 
[RS wa 
Y JF = F n—3 
=P AA rr Fua] 
ai (n — 3)! 
O [Ey FUN A" [FC — For? 
7 (n— 2)! a (n— 2)! 
Since $~ = n(n — 1), the result follows. 


(m2)! 
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Some Modes 
of Convergence 
of Random Variables, 
Applications 


The first thing which is done in this chapter is to introduce two modes of con- 
vergence for sequences of r.v.'s, convergence in distribution and convergence 
in probability, and then to investigate their relationship. 

A suitable application of these convergences leads to the most important 
results in this chapter, which are the Weak Law of Large Numbers and the 
Central Limit Theorem. These results are illustrated by concrete examples, 
including numerical examples in the case of the Central Limit Theorem. 

In the final section of the chapter, it is shown that convergence in probability 
is preserved under continuity. This is also the case, in a limited sense, for 
convergence in distribution. These statements are illustrated by two general 
results and a specific application. 

The proofs of some of the theorems stated are given in considerable detail; 
in some cases, only a rough outline is presented, whereas in other cases, we 
restrict ourselves to the statements of the theorems alone. 





i 7.1 Convergence in Distribution or in Probability and their Relationship 


In all that follows, X;,..., Xn» are iid. r.v.'s, which may be either discrete 
or continuous. In applications, these r.v.’s represent n independent observa- 
tions on a r.v. X, associated with an underlying phenomenon which is of im- 
portance to us. In a probabilistic/statistical environment, our interest lies in 
knowing the distribution of X, whether it is represented by the probabilities 
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Figure 7.1 


The d.f. Represented by 


the Solid Curve Is 
Approximated by the 
d.f.’s Represented by the 


ee: See. 00. a = oS. ie UM 
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P(X € B),B CH, or the df. F of the X;’s, or their p.d.f. f. In practice, 
this distribution is unknown to us. Something then that would be desir- 
able would be to approximate the unknown distribution, in some sense, by 
a known distribution. In this section, the foundation is set for such an 
approximation. 


DEFINITION 1 

Let Yj,..., Y, be r.v.'s with respective d.f.'s. F3, ..., Fn. The r.v.'s may 
be either discrete or continuous and need be neither independent nor 
identically distributed. Also, let Y be a r.v. with d.f. G. We say that the 
sequence of r.v.'s (Y, ), n > 1, converges in distribution to the r.v. Y as 
n — oo and write Y, 5 Y, if F,(x) me G(x) for all continuity points x 
of G. 


























The following example illustrates the definition. 


[__ EXAMPLE 1 | For n > 1, let the d.f.'s F, and the d.f. G be given by: 


0, ifa<1—4 


i oe i 0, ifx<1 
1, fa>1+1 ; e 


n 


and discuss whether or not F(x) converges to G(x) as n > 00. 
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Figure 7.2 


The d.f. G Is 
Approximated by the 
d.f.'s F, at all Points 
xAÁl 
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DISCUSSION The d.f. G is continuous everywhere except for the point 
x = 1. For x < 1, let no > 1/(1 — x). Then x < 1 — an and also x < 1-1 
for all n 2 no. Thus, Fax) = 0, n > no. For x > 1, let ny > 1/(x — 1). Then 
> 1+4 — and also x > 1+4 for alln > No, so that F (x) = 1, n > No. Thus, for 
24 l, F(a) > G(x), so, if Y, and Y are r.v.'s such that Y, ~ F, and Y ~ G, 


then Y, E y. 
Nn—>00 


REMARKI The example also illustrates the point that, if xis a discontinuity 
point of G, then F(x) need not converge to G(x). In Example 1, F,(1) = 5 for 
all n, and G(1) = 1. 

The idea, of course, behind Definition 1 is the approximation of the (pre- 
sumably unknown) probability P(Y < x) = G(x) by the (presumably known) 
probabilities P(Y,, < x) = F,,(x), for large enough n. Convergence in distribu- 
tion also allows the approximation of probabilities of the form P(x < Y < y) 
by the probabilities P(x < Y„ < y), for x and y continuity points of G. This is 
so because 


Pla < Yn < y) = P(Yn < Y) — POY, < 2) = FW — Fr@ 
=> G(y) — Gv) = Plx < Y < y). 


n—>00 
Whereas convergence in distribution allows the comparison of certain 
probabilities, calculated in terms of the individual r.v.'s Y, and Y, it does 
not provide evaluation of probabilities calculated on the joint behavior of Y, 
and Y. This is taken care of to a satisfactory extent by the following mode of 
convergence. 


DEFINITION 2 
The sequence of r.v.'s (Y, ), n > 1, converges in probability to the r.v. 
Y as n>o, if, for every e > 0, P(Y — Y| > €) = 0; equivalently, 


P(|Yn — Y| < £) — 1. The notation used is: Y, > Y. 


Thus, if the event A,(e) is defined by: A,(e)={s € S;Y(s)— e < Y,(s) < 
Y(s) + e), (that is, the event for which the r.v. Y, is within e from 
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the rv. Y), then P(A,(e)) = 1 for every ¢ > 0. Equivalently, P(A‘ (e)) = 
P({s € S;¥,(s) < ¥(s) — e or ¥,(s) > ¥(s) + €}) 2,0. 




















The probability that Y, lies within a small neighborhood around Y, such as 
(Yn — €, Yn + e), is as close to 1 as one pleases, provided n is sufficiently large. 

It is rather clear that convergence in probability is stronger than con- 
vergence in distribution. That this is, indeed, the case is illustrated by the 
following example, where we have convergence in distribution but not in 
probability. 


Let S=(1, 2, 3, 4), and on the subsets of S, let P be the discrete uniform 
probability function. Define the following r.v.'s: 


Xn) = X2) = 1, Xn(3) = Xn(4) = 0, n=1,2,..., 
and 


XD=Xx0)=0 XBe=X0=1 


DISCUSSION Then 
IXp(s) — X(s)| = 1 foralls eS. 


Hence X, does not converge in probability to X, as n > oo. Now, 


0, «<0 0, «<0 
Plays 44, Vea <1, G(a)=43, O<xw<l 
l, x>1 1, x>1, 


so that F (x) = G(x) for all x e N. Thus, trivially, F (x) = G(x) for all conti- 
nuity points of G; that is, Xn > X, but Xn does not converge in probability 
to X. 

The precise relationship between convergence in distribution and conver- 
gence in probability is stated in the following theorem. 
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Let {Ya} n > 1, bea aos of r.v.'s and let Y be a r.v. Then Y, => Y 
always implies Y,, = Y. The converse is not true in general (as illus- 
trated by Example : 2). However, it is true if P(Y =c)=1, where c is a 
constant. That ISEI =. c implies Y, = c, so that Y, = c if and 
only if Y, n 2, ¢ 








PROOF (outline) That Y, > Y implies Y,, = Y is established by employ- 
ing the concepts of liminf (limit inferior) and lim sup(limit superior) of a 
sequence of numbers, and we choose not to pursue it. For the proof of 
the fact that Y,, c implies Y, > c, observe that F(x)=0, for x <c and 


F(x)= 1 for x > c, where F is the d.f. of c so that c — e and c + e are continuity 
points of F for alle > 0. But P(|Y, — c| < e) = P(c — £ < Y, <c+e)= 
PY, < c+ €)— PMY, < c — €) = Fale+e)— P(Y, < c — e). However, 


Fale + €) —> land PY, <c-—eé)< Pn < c — £) = Fale - €) 30 so that 
P(Yn<c— €) 20. Thus, P(|Y, — cl < e) aes lor Y, == 6 A 


According to Definition 1, in order to establish that Y, ca Y, all one 
has to do is to prove the (pointwise) convergence F,(x) — F(x) for ev- 


N—>00 


ery continuity point x of F. Asis often the case, however, definitions do not 
lend themselves to checking the concepts defined. This also holds here. Ac- 
cordingly, convergence in distribution is delegated to convergence of m.g.f.'s, 
which, in general, is a much easier task to perform. That this can be done is 
based on the following deep probabilistic result. Its justification is omitted 
entirely. 





(Continuity Theorem) Forn = 1,2,..., let Y, and Y be r.v.’s with 
respective d.f.’s F, and F, and respective m.g.f.'s M, and M (which are 
assumed to be finite at least in an interval (—c, c), some c > 0). Then: 

@ If F,(w) ores F(x) for all continuity points x of F, it follows that 
M(t) a M(t) for all t e (=c, c). 

Gi) Let M,(t) ane g(t), t € (—c, c), some function g, which is con- 
tinuous at t = 0. Then g is, actually, a m.g.f. and let F be the corre- 
sponding d.f. It follows that F,(x) = F(x) for all continuity points x 
of F. 











Thus, according to this result, Y,, > Y or, equivalently, Pa) —> 
F(x) for all continuity points x of F, if ando only if M(t) = M(t), t € E E 0, 


some c > 0. The fact that convergence of m.g.f.'s implies convergence of the 
respective d.f.'s is the most useful part from a practical viewpoint. 
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1.1 Forn=1,2,..., let X, be ar.v. with d.f. F,, defined by: F,(x)=0for x < n, 
and F,(x)=1 for x > n. Then show that F(x) == F(x), which is identi- 
cally 0 in % and hence it is not a d.f. of a rv. 


1.2 Let {Xn}, n > 1, be r.v.'s with X, taking the values 1 and 0 with respective 
probabilities p, and 1 — pn; i.e., P(X, = 1) = pn and P(X, = 0) = 1 — pr. 
Then show that X,, = 0, if and only if pn Paes 0. 


1.3 For n= 1,2,..., let X, be a r.v. distributed as B(n, pn) and suppose that 
Nn È à € (0, 00). Then show that Xn => X, where X is a r.v. dis- 
tributed as P(A), by showing that Mx, (1) —> Mx(0), t eH. 


Nn—>00 
1.4 Let Y; ,, and Ynn be the smallest and the largest order statistics based on 
the random sample X;,..., Xn from the U(0, 1) distribution. Then show 
that: 
. P a P 

(i) Yin eed 0; Gi) Ynn EA 1. 
Hint: Fore > 0, calculate the probabilities: P(]Y¡ n| > £) and P(|Ynn— 
1| > e) and show that they tend to 0 as n > oo. Use the p.d.f.'s of Yin 
and Y,, y determined in Example 10 of Chapter 6. 


1.5 Refer to Exercise 1.4 and set: Un =NY 1 n, Va =n — Ynn), and let U and 
V be r.v.'s having the Negative Exponential distribution with parameter 
A = 1. Then: 
(i) Derive the p.d.f.’s of the r.v.’s U, and Vy. 
(ii) Derive the d.f.’s of the r.v.'s U, and Vp, and show that Un a U by 
showing that 
Fy, (u) — Fulu), wen. 


N—>00 
Likewise for V,,. 


1.6 We say that a sequence {X,}, > 1, of r.v.'s converges to a rv. X in 
quadratic mean and write 


X, XK or X, “3X, if EQ, —X¥ — 0. 


Now, if X1,..., Xn are iid. r.v.'s with (finite) expectation u and (finite) 
= Me. 
variance o”, show that the sample mean X,, 5 u. 


1.7 In Theorem 1 of Chapter 4, the following, version of the Cauchy-Schwarz 
inequality was established: For any two r.v.'s X and Y with EX=EY=0 
and Var(X) = Var(Y)= 1, it holds: |E(XY)| < 1. (This is, actually, part 
only of said inequality.) Another more general version of this inequality 
is the following: For any two r.v.’s X and Y with finite expectations and 
variances, it holds: |E(XY)| < E|XY| < E¥?|X/?? x EV? Y p?. 
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(i) Prove the inequality in this setting. 

(ii) For any r.v. X, show that |EX] < E|X| < EV?|X1?. 
Hint: For part (i), use the obvious result (vty)? = 77+ y?+2ay > Oin 
order to conclude that tay < (x? + y?) and hence |xy| < }(v? + y?). 
Next, replace x by X/E‘/?|X|?, and y by Y/E'/?|Y|? (assuming, of 
course, that E|X|? > 0, E|Y|? > 0, because otherwise the inequal- 
ity is, trivially, true), and take the expectations of both sides to arrive 
at the desirable result. 








1.8 Let {Xn} and {Yn}, n > 1, be two sequences of r.v. 5 such that: Xn = X, 
some r.v., and Xn — Yn as 0. Then show that Y,, = X. 
Hint: Use appropriately the Cauchy—Schwarz inequality discussed in 


Exercise 1.7. 


7.2 Some Applications of Convergence in Distribution: The Weak Law of Large Numbers 





| and the Central Limit Theorem 


As a first application of the concept of convergence in distribution, we have 
the so-called Weak Law of Large Numbers (WLLN). This result is stated and 
proved, an interpretation is provided, and then a number of specific applica- 
tions are presented. 


THEOREM 3 





(Weak Law of Large Numbers, WLLN) Let X1, X2, ... be iid. r.v.'s 
with (common) finite expectation u, and let X,, be the sample mean of 


XG o 6 09 go WIM XG = u, or (on account of Theorem 1) X;, = M 











PROOF The proof is a one-line proof, if it happens that the X;’s also have 
a (common) finite variance o? (which they are not a to have for the 
validity of the theorem). Since EX, = and Var(X,) = ©, the Tchebichev 


a 


inequality gives, for every e > 0, PX, — u| > £) < 3 x "a! 32 0, so that 
Xn 2 h- 

Without reference to the variance, one would havetoshowthat My, (t) ho 
M,,(t) (fort € (—c, c), some c > 0). Let M stand for the (common) m.g.f. of the 
X;’s. Then use familiar properties of the m.g.f. and independence of the X;’s in 
order to obtain: 


mor) Ton) GT 


Consider the function M(z), and expand it around z=0 according to 
Taylor’s formula up to terms of first order to get: 


1! dz 
= 1+ zu + RÆ, 


M(z)= M(0) + 2U Raka + R(z) (Eno >0asz > 0) 
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since M(0)= 1 and 2M(Dle=o = EX; = n. Replacing z by t/n, for fixed t, the 
last formula becomes: 


t t t 
u(*) =1+-u4+ r(+), where ne) >0 asn —> o. 
n n n n 
Therefore 


yq? 
Mz, ®© = + A o pm] ; 


and this converges to e'*, as n > oo, by Remark 2 below. Since e'* is the 
m.g.f. of (the degenerate r.v.) u, we have shown that Mx, (t) —> M,(t), as was 
to beseen. A 


REMARK 2 For every z e K, one way of defining the exponential function 
e” is: e* = lim» 0 (1 + =)". It is a consequence of this result that, as n > 00, 
also (1 + &)” — e? whenever 2 > 2. 

The interpretation and most common use of the WLLN is that, if y is an 
unknown entity, which is typically the case in statistics, then yu may be approx- 
imated (in the sense of distribution or probability) by the known entity X.,, for 
sufficiently large n. 


= 7.2.1 Applications of the WLLN 


1. If the independent X;'s are distributed as B(1, p), then EX; = p and there- 
fore X;, = p. 

2. If the independent X;’s are distributed as P(A), then EX; = 1 and therefore 

3. If the independent X;’s are distributed as N(u, 0?), then EX; = u and 
therefore X,, > u. 

4. Ifthe independent X;’s are distributed as Negative Exponential with param- 


eter à, f(1)=2e*", x > 0, then EX; = 1/A and therefore X,, —. 1/2. 


A somewhat more involved application is that of the approximation of 
an entire d.f. by the so-called empirical d.f. To this effect: 


5. Let X,, Xo,..., Xn be i.i.d. r.v.'s with d.f. F, and define the empirical d.f. Fn 
as follows. For each x € % and each s e S, 


1 
FE (x, s) = „number of X1 (s), .. ., Xn(s) < a]. 


From this definition, it is immediate that, for each fixed x e R, F,(x, s)is a 
r.v. as a function of s, and for each fixed s € S, F(x, s)is ad.f. as a function of x. 
Actually, if we set Y;(x, s)= 1 when X;(s) < x, and Y;(x, s) = 0 when X;(s) > x, 
then Y;(a, -),..., Y;(x, -) are r.v.’s which are independent and distributed as 
Bd, F(x)), since P[Y;¡(x, -)=1]= P(X; < 1)="F(x). Also, EY;¡(x, -)= F(a). 
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Then F,,(%, s) may be rewritten as: 


n 
Fn(x, s) = L y Y;¡(x, s), the sample mean of Y¡(x, $), ...,Y,(x, S). 

i=l 
By omitting the sample point s, as is usually the case, we write F (x) and 
Y,(v), i=1,...,n rather than F,(x,s) and Y;(x, s), i= 1,..., n, respectively. 
Then F(x) = F(x) for each x e R. Thus, for every x e It, the value of F(x) 
of the (potentially unknown) d.f. F is approximated by the (known) values 

Fa (œ) of the r.v.’s F,(x). 


REMARK 3 Actually, it can be shown that the convergence F (£) = F(x) 
is uniform in x € R. This implies that, for every e > 0, there is a positive 
integer N(e) such that F (x) — e < F(x) < F,(x) + e with probability as close 
to 1 as one pleases simultaneously for all x e R, provided n > N(e). 

As another application of the concept of convergence in distribution, we 
obtain, perhaps, the most celebrated theorem of Probability Theory; it is the 
so-called Central Limit Theorem (CLT), which is stated and proved below. 
Comments on the significance of the CLT follow, and the section is concluded 
by applications and numerical examples. 





(Central Limit Theorem, CLT) Let X, X2, ...bei.i.d. r.v.'s with finite 
expectation u and finite and positive variance o”, and let X,, be the sample 
mean of X1, ..., Xn. Then: 


Moy = Da Oy = Xn = 
n nm _ An u JWXn A NG, 





or 


/MXn — M) = 4 ala aa 
pra <2] z= TS dx, zen. (1) 


(Also, see Remark 4(iii).) 











REMARK 4 
(i) eo 2 se the ne sum 1 Xi, Sn = 1 Xi, so that ES, = nu 
and Var(S,,) = no”. Then: 


Sn— ES,  Sy-—me Xpn-p _ J/n(Xn— pb) 


VVar(Sn) oyn 7 of /n g 
Therefore, by (1): 











Sn — Nb 
P < (2), ER. 2 
(S <e) 290, 2 2) 
(Although the notation S,, has been used before (relation (12) in Chapter 
5) to denote the sample standard deviation of X4, ..., Xn, there should be 
no confusion; from the context, it should be clear what S,, stands for.) 
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(ii) An interpretation of (1) and (2) is that, for sufficiently large n: 
Xp — Sn — 
pf ee < -| = 2 < 2) zo) zen (©) 
o o/m 
Often this approximation is also denoted (rather loosely) as follows: 


VAXn =) 
oO 


2 
=N(0,1) or X,= nu, 7) or S, ~ Nnu, no?) 
(4) 
(iii) Actually, it can be shown that the convergence in (1) or (2) is uniform in 
z €e R. That is to say, if we set 


nos P| EA ESE 
then 
F,(2) = (z) uniformlyinz e Rh. (6) 


To be more precise, for every e > 0, there exists a positive integer N(e) 
independent of z e K, such that 


En) — ®(2)| <e forn> N(e)andall z e KR. (7) 


(iv) The approximation of the probability F,, (2) by P(2), provided by the CLT, 
is also referred to as Normal approximation for obvious reasons. 

(v) On account of (3), the CLT also allows for the approximation of probabil- 
ities of the form P(a < S,, < b) for any a < b. Indeed, 


P(a < Sn < b) = P(S, < b) — P(Sn < a) 


AS El <t) 
oyn `~ oyn oyn ` oyn 

















Sn — ML Sn — NU 
=P < b* P <a 
( oyn ` ) ( oym sa) 
where 
a— nu b- nyu 
= d bš = : 8 
n oyn a m a/n (8) 
By (8), 
Sn — ML Sn — Nu 
P <b; | = 9(b; d P|——— <Q) > Díaz), 
(FM < of) = 00% ana PEL <a) = 0005) 
so that 
Pía < Sn < b) = D(b;,) — D(a;,). (9) 


The uniformity referred to in Remark 3(iii) is what, actually, validates 
many of the applications of the CLT. This is the case, for instance, in 
Remark 3(v). 
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(vi) So, the convergence in (i) is a special case of the convergence depicted in 
Figure 7.1, where the limiting d.f. is Y and F» is the d.f. of V, This 
convergence holds for all x e K since ® is a continuous function in . 


From a large collection of bolts which is known to contain 3% defective bolts, 
1,000 are chosen at random. If X is the number of the defective bolts among 
those chosen, what is the (approximate) probability that X does not exceed 
5% of 1,000? 


DISCUSSION With the selection of the ith bolt, associate the r.v. X; to take 
the value 1, if the bolt is defective, and 0 otherwise. Then it may be assumed 
that the r.v.’s X;, i = 1,..., 1,000 are independently distributed as B(1, 0.03). 
Furthermore, itis clear that X = DLO ° X;. Since 5% of 1,000 is 50, the required 
probability is: P(X < 50). Since EX; = 0.03, Var(X;) = 0.03 x 0.97 = 0.0291, 
the CLT gives: 


P(X < 50) = P(0 < X < 50) = P(-0.5 < X < 50) 
= P(X < 50) — P(X < —0.5) = P(b*) — P (a*), 








oeae aie —0.5 — 1,000 x 0.03 _ 30.5 _ 305 _. E 
"1,000 x 0.03 x 0.97 Y/291 5.394. ' ” 
p 50 — 1,000 x 0.03 20 20 371, 








n= 71,000 x 003x097 291 5.394 
so that 


P(X < 50) ~ (3.71) — &(—5.65) = 0(8.71) = 0.999896. 


A certain manufacturing process produces vacuum tubes whose lifetimes in 
hours are independent r.v.'s with Negative Exponential distribution with mean 
1,500 hours. What is the probability that the total life of 50 tubes will exceed 
80,000 hours? 


DISCUSSION If X;, is the r.v. denoting the lifetime of the ith vacuum tube, 
then X;,7 = 1,...,50 are independent Negative Exponentially distributed 
with EX; = + = 1,500 and Var(X;) = $ = 1,500”. Since nEX; = 50 x 1,500 = 
75,000, o y/n = 1,5004/50, if we set S59 = LB X;, then the required probability 
is: 





80,000 — 75,000 
P(S50 > 80,000) = 1 — P(S5o < 80,000) > 1 o( i i ) 





1,500.50 
=j (e) ~ 1 — (0.47) 


= 1 — 0.680822 = 0.319178 = 0.319. 


The proof of the theorem is based on the same ideas as those used in the 
proof of the WLLN and goes as follows. 
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PROOF OF THEOREM 4 Set Z; = = so that Z1, ..., Zn are i.i.d. rv.’s with 
EZ; = 0 and Var(Z;) = 1. Also, 





1 Š 1 MUX n— H) 
Jn 3 Zi = oa nu) = = : (10) 
With Fn defined by (5), we wish to show that (6) holds (except for the unifor- 
mity assertion, with which we will not concern ourselves). Its justification is 
provided by Lemma 1, pages 206-207, in the book “A Course in Mathematical 
Statistics,” 2nd edition (1997), Academic Press, by G. G. Roussas. By Theorem 
2, 1t suffices to show that, for all £, 


Mita — Mz(t) =e”. (11) 


noo 
By means of (10), and with M standing for the (common) m.g.f. of the Z;’s, we 
have: 


t 
M fa &—-uyjo®) = Ma yr, 7,0) = My, z (=) 


Par e 


Expand the function M(z) around z = 0 according to Taylor's formula up to 


terms of second order to get: 
MÒ = MO) + M VA R 
(2) = M(0) + Mee (Dle=o + Base (2)le-0 + R) 


2 
Z 
=1+2EZ, + S EZ, + RÆ 


2 1 
=1+ > + R(2), where pre > 0 asz— 0. 


In this last formula, replace z by t/./n, for fixed t, in order to obtain: 
t i? t t 
M| — |= 1+ — + R| —], nk(| —]—-0 asn 
(5) Can” (=) (5) 2% 
Therefore (12) becomes: 
È 2n t 
e aat h s+ Fala) 
M 2109/00) = E F on + R(=) =41+ 7 ; 


and this converges to e”/2, as n > oo, by Remark 2. This completes the proof 
of the theorem. A 


L. 7.2.2 Applications of the CLT 


In all of the following applications, it will be assumed that n is sufficiently 
large, so that the CLT will apply. 

1. Let the independent X;’s be distributed as B(1, p), set Sn = »;_¡ Xi, and 
let a, b be integers such that 0 < a < b < n. By an application of the CLT, we 
wish to find an approximate value to the probability P(a < Sn < b). 


n 
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If p denotes the proportion of defective items in a large lot of certain 
items, then S,, is the number of actually defective items among the n sampled. 
Then approximation of the probability P(a < S < b) is meaningful when the 
Binomial tables are not usable (either because of p or because of nor, perhaps, 
because of both). 

Here EX; = p, Var(X;) = pq (q =1-— p), and therefore by (9): 








— b— 
P(a < Sn <b) ~ P(b*)— &(a*), where až = “ a = (13) 


REMARK 5 Tf the required probability is of any one of the forms: P(a < 
Sn < b) or P(a < Sn < b) or P(a < Sn < b), then formula (9) applies again, 
provided the necessary adjustments are first made; namely, P(a < S, < b) = 
P(a—1 < Sn <b), P(a < Sn <b)= P(a— 1 < Sn <b- 1), P(a < Sn < b) = 
P(a < Sn < b — 1). However, if the underlying distribution is continuous, then 
P(a < Sn < b) = Pla < Sn < b) = P(a < Sn < b) = Pla < Sn < b), and no 
adjustments are required for the approximation in (9) to hold. 


(Numerical) For n = 100 and p = 3 or p = 3, find the probability P(45 < 
Sn < 55). 


DISCUSSION 


G) For p= 5 it is seen (from tables) that the exact value is equal to: 0.7288. 
For the Normal approximation, we have: P(45 < Sn < 55) = P(44 < Sn < 
55) and, by (13): 


44—100x i 6 55 — 100 x 1 
a* = De set, a Mae eat 


Joti * J100x 1x1 4 


Therefore &(b*) — ®(a*) = (1) — 912) = AMD) + (1.2) - 1 = 
0.841345 + 0.884930 — 1 = 0.7263. So: 


Exact value: 0.7288, Approximate value: 0.7263, 








and the exact probability is underestimated by 0.0025, or the approximat- 
ing probability is about 99.66% of the exact probability. 

(ii) For p= > the exact probability is almost 0; 0.0000. For the approximate 
probability, we find a* = 2.75 and b* = 4.15, so that ®(b*) — ®(a*) = 
0.0030. Thus: 


Exact value: 0.0000, Approximate value: 0.0030, 


and the exact probability is overestimated by 0.0030. 


2. If the underlying distribution is P(A), then ES, = Var(S,) = nà and 
formulas (8) and (9) become: 








a— nÀ b-nr 
P(a < Sn < b) = D(b;) — (až), aù = Sar br = Ta 


The comments made in Remark 4 apply here also. 
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L EXAMPLE6 | (Numerical) In the Poisson distribution P(A), let n and à be so that nA = 16 
and find the probability P(12 < Sn < 21)(= PAI < Sn < 21)). 


DISCUSSION The exact value (found from tables) is: 0.7838. For the 
Normal approximation, we have: 








~ 11-16 5_ ie pi ISIC E o 
/16 4 v16 4 
so that P(b*) — &(a*) = (1.25) — #(—1.25) = 20 (1.25) — 1 = 2 x 0.894350 — 1 


= 0.7887. So: 
Exact value: 0.7838, Approximate value: 0.7887, 


and the exact probability is overestimated by 0.0049, or the approximating 
probability is about 100.63% of the exact probability. 


L. 7.2.3 The Continuity Correction 


When a discrete distribution is approximated by the Normal distribution, the 
error committed is easy to see in a geometric picture. This is done, for instance 
in Figure 7.3, where the p.d.f. of the B(10, 0.2) distribution is approximated 
by the p.d.f. of the N(10 x 0.2, 10 x 0.2 x 0.8) = N(, 1.6) distribution (see 
relation (4)). From the same figure, it is also clear how the approximation may 
be improved. 


Figure 7.3 





Exact and 
Approximate Values 
for the Probability 
P(a< Sn < bn) = 
P(a— 1 < Sn < 
bn) = P< Sn <3) 

















Now 
P(I < Sn < 3) = PQ < Sn < 3) = fr) + fa) 
= shaded area, 


while the approximation without correction is the area bounded by the Normal 
curve, the horizontal axis, and the abscissas 1 and 3. Clearly, the correction, 
given by the area bounded by the Normal curve, the horizontal axis, and the 
abscissas 1.5 and 3.5, is closer to the exact area. 

To summarize, under the conditions of the CLT, and for discrete r.v.'s, P(a < 
Sn < b) = &(b*) — P(a*), where a* = Ena and b* = oe without continuity 
correction, and P(a < S < b) = ®(b’) — &(a’), where a’ = 5 and 


a vn 





b' = SED with continuity correction. 
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For integer-valued r.v.'s and probabilities of the form P(a < Sn < b), we 
first rewrite the expression as follows: 


P(a < Sn <b)= Pla— 1 < Sn <b), 
and then apply the preceding approximations in order to obtain: 


P(a < Sn < b) ~ ®(b*) — P(a*), where 





x —_ a-l-nu x _ b-nu . : s a 
CS and b* = + M without continuity correction, and P(a < Sn < 


b) = (b) — P(a”), where a’ = eson and b'= nr with continuity 
correction. Similarly for the intervals [a, b) and (a, b). 

The improvement brought about by the continuity correction is demon- 
strated by the following numerical examples. 


¡IRTE Continued) 


DISCUSSION 

(i) For p = 5, we get: 

— 444+05-100x5 55 
7 100 x 5 x 5 5 


_ 55+05-100x5 55 


100 x 5 x 3 


! 





b’ 





so that: 
b) — O(a’) = (1.1) — @(-1.1) = 20(1.1)- 1 
= 2 x 0.864334 — 1 = 0.7286. 


Thus, we have: 


Exact value: 0.7288, 
Approximate value with continuity correction: 0.7286, 


and the approximation underestimates the probability by only 0.0002, or 
the approximating probability (with continuity correction) is about 99.97% 
of the exact probability. 

(ii) For p = 5, we have a! = 2.86, b! = 5.23 and ®(b’) — ®(a@’) = 0.0021. 
Then: 


Exact value: 0.0000, 
Approximate value with continuity correction: 0.0021, 


and the probability is overestimated by only 0.0021. 
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(continued) 


DISCUSSION Here: 








ries 114+ 0.5 — 16 _ 4.5 10108, b= 21+ 0.5 — 16 = 5.5 = 1.375, 
v16 7 wie : 
so that: 


(b) — O(a’) = (1.375) — 90(-1.125) = 0(1.375) + (1.125) — 1 = 0.7851. 
Thus: 


Exact value: 0.7838, 
Approximate value with continuity correction: 0.7851, 


and the approximation overestimates the probability by only 0.0013, or the 
approximating probability (with continuity correction) is about 100.17% of the 
exact probability. 


2.1 Let Xj, ..., Xn be iid. r.v.’s, and for a positive integer k, suppose that 
EX‘ is finite. Form the kth sample mean X“ defined by 


= 12 
x® = a DX. 
i=l 


Then show that: 
XO EXE, 
Nn næ 1 
2.2 Let X be a r.v. with p.d.f. fx(x) = ca”, x = 0,1,...(0 < a < 1). Then 
c = 1 — a by Exercise 2.8 in Chapter 2. 
(i) Show that the m.g.f. of X is: Mx(t) = 2, t < — loga. 


Q 
—aet? 


(ii) Use the m.g.f. to show that EX = ¡%. 


Gii) If X1,..., Xn is a random sample from fx, show that the WLLN 
holds by showing that 


Mz (H) 32€ 5% = Mrx(t), t< —loga. 


Hint: Expand e' around 0 up to second term, according to Taylor's 
formula, e' = 1+ t + R(t), where RO 0, replace t by £, and 


n 
use the fact that (1 + =)” > e”, if a, > x as n — 00. 


2.3 Let the r.v. X be distributed as B(150, 0.6). Then: 
(i) Write down the formula for the exact probability P(X < 80). 
(ii) Use the CLT in order to find an approximate value for the above 
probability. (Do not employ the continuity correction.) 
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2.4 A binomial experiment with probability p of a success is repeated in- 
dependently 1,000 times, and let X be the r.v. denoting the number of 
successes. For p = 3 and p = į, find: 

(i) The exact probability P(1,000p — 50 < X < 1,000p + 50) 
(ii) Use the CLT to find an approximate value for this probability. 


2.5 Let Xj, ..., X100 be independent r.v.'s distributed as B(1, p). Then: 
(i) Write out the expression for the exact probability P One X; = 50). 
(ii) Use of CLT in order to find an approximate value for this probability. 
(iii) What is the numerical value of the probability in part (ii) for p = 0.5? 


Hint: For part (ii), first observe that P(X = 50) = P(49.5 < X < 50), 
and then apply the CLT. 


2.6 Fifty balanced dice are tossed once, and let X be the r.v. denoting the 
sum of the upturned spots. Use the CLT to find an approximate value of 
the probability P(150 < X < 200). 


Hint: With the ith die, associate the r.v. X; which takes on the values 
1 through 6, each with probability 1/6. These r.v.’s may be assumed to 
be independent and X = YY, X;. 


2.7 One thousand cards are drawn (with replacement) from a standard deck 
of 52 playing cards, and let X be the r.v. denoting the total number of 
aces drawn. Use the CLT to find an approximate value of the probability 
P(65 < X < 90). 


2.8 From a large collection of bolts which is known to contain 3% defec- 
tive bolts, 1,000 are chosen at random, and let X be the r.v. denoting 
the number of defective bolts among those chosen. Use the CLT to find 
an approximate value of the probability that X does not exceed 5% of 
1,000. 


Hint: With the ith bolt drawn, associate the r.v. X; which takes on 
the value 1, if the bolt drawn is defective, and 0 otherwise. Since the 
collection of bolts is large, we may assume that after each drawing, the 
proportion of the remaining defective bolts remains (approximately) 
the same. This implies that the independent r.v.'s X1,..., X1,000 are 
distributed as B(1, 0.03) and that X = 9” X; ~ B(1,000, 0.3). 


2.9 A manufacturing process produces defective items at the constant (but 
unknown to us) proportion p. Suppose that n items are sampled inde- 
pendently, and let X be the r.v. denoting the number of defective items 
among the n, so that X ~ B(n, p). Determine the smallest value of the 
sample size n, so that 


X 
(le? 
n 





< 0.05/77) >0.95 (q=1-p) 
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G) By utilizing the CLT. 
(ii) By using the Tchebichev inequality. 
(iii) Compare the answers in parts (i) and (ii). 


2.10 Suppose that 53% of the voters favor a certain legislative proposal. How 
many voters must be sampled so that the observed relative frequency of 
those favoring the proposal will not differ from the assumed frequency 
by more than 2% with probability 0.99? 


Hint: With the ith voter sampled, associate the rv. X; which takes 
on the value 1, if the voter favors the proposal, and 0 otherwise. Then 
it may be assumed that the r.v.’s X1, ..., Xn are independent and their 
common distribution is B(1, 0.53). Furthermore, the number of the 
voters favoring the proposal is X = ee ¡ Xi. Use the CLT in order to 
find the required probability. 


2.11 In playing a game, you win or lose $1 with probability 0.5, and you play 
the game independently 1,000 times. Use the CLT to find an approximate 
value of the probability that your fortune (i.e., the total amount you won 
or lost) is at least $10. 


Hint: With the ith game, associate the rv. X; which takes on the 
value 1 if $1 is won, and —1 if $1 is lost. Then the rv.’s Xj, ..., X1,000 
are independent, and the fortune X is given by por Xi. 

2.12 Itis known that the number of misprints in a page of a certain publication 
isar.v. X having the Poisson distribution with parameter A. If X1, ..., Xn 
are the misprints counted in n pages, use the CLT to determine the (ap- 
proximate) probability that the total number of misprints is: 

(i) Not more than An. 
(ii) At least An. 
(iii) Between An/2 and 31n/2. 
(iv) Give the numerical values in parts (i)—Gii) for An = 100 (which may 
be interpreted, e.g., as one misprint per 4 pages (A = 0.25) ina book 
of 400 pages). 


2.13 Let the r.v. X be distributed as P(100). Then: 
(i) Write down the formula for the exact probability P(X < 116). 
(ii) Use the CLT appropriately in order to find an approximate value for 
the above probability. (Do not use the continuity correction.) 


Hint: Select n large and A small, so that nA = 100 and look at X as 
the sum > ¡X; of n independent rv.’s X;,..., Xn distributed 
as P(A). 


2.14 A certain manufacturing process produces vacuum tubes whose life- 
times in hours are independently distributed r.v.’s with Negative Expo- 
nential distribution with mean 1,500 hours. Use the CLT in order to find an 
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approximate value for the probability that the total life of 50 tubes will 
exceed 80,000 hours. 


2.15 The lifespan of an electronic component in a (complicated) system is a 
r.v. X having the Negative Exponential distribution with parameter i. 

(i) What is the probability that said lifespan will be at least t time 
units? 

(ii) If the independent r.v.’s Xj, ..., Xn represent the lifespans of n spare 
items such as the one described above, then Y = );_, X; is the 
combined lifespan of these n items. Use the CLT in order to find 
an approximate value of the probability P(t; < Y < t2), where 
0 < t < tz are given time units. 

(iii) Compute the numerical answer in part (i), if t = —log(0.9)/A. 

(iv) Do the same for part (ii), if à = 1/10, n= 36, tı = 300, and t2 = 420. 


2.16 Let the independent r.v.'s X4, ..., Xn be distributed as U(0, 1). 
(i) Use the CLT to find an approximate value for the probability P(a < 
X <b)(a <b). 
Gi) What is the numerical value of this probability for n = 12, a = 7/16, 
and b = 9/16? 
2.17 If the independent r.v.’s X1, ..., Xı2 are distributed as U (0, 0)(0 > 0), 
use the CLT to show that the probability PE <% < 2) is approximately 
equal to 0.9973. 


2.18 Refer to Exercise 3.42 in Chapter 3 and let X;, i = 1,..., n be the di- 
ameters of n ball bearings. If EX; = u = 0.5 inch and s.d. (X;) = 0 = 
0.0005 inch, use the CLT to determine the smallest value of n for which 
P(IX — u| < 0.0001) = 0.99, where X is the sample mean of the X;’s. 


2.19 The iid. rv.’s Xj, ..., X100 have (finite) mean y and variance o? = 4. Use 
the CLT to determine the value of the constant c for which P(|X — u| < 
c) = 0.90, where X is the sample means of the X;’s. 


2.20 Let X;,..., Xn be iid. r.v.'s with (finite) expectation y and (finite and 
positive) variance o”, and let X,, be the sample mean of the X;’s. Deter- 
mine the smallest value of the sample size n, in terms of k and p, for 
which P(|X,, — u| < ko) > p, where p € (0, 1), k > 0. Do so by using: 

(i) The CLT. 
(ii) The Tchebichev inequality. 
(iii) Find the numerical values of nin parts (i) and (ii) for p = 0.90, 0.95, 
0.99 and k = 0.50, 0.25, 0.10 for each value of p. 


2.21 Refer to Exercise 3.41 in Chapter 3, and suppose that the r.v. X consid- 
ered there has EX = 2,000 and s.d.(X ) = 200, but is not necessarily Nor- 
mally distributed. Also, consider another manufacturing process produc- 
ing light bulbs whose mean lifespan is claimed to be 10% higher than the 
mean lifespan of the bulbs produced by the existing process; it is assumed 
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that the s.d. remains the same for the new process. How many bulbs 
manufactured by the new process must be examined to establish the 
claim of their superiority (should that be the case) with probability 
0.95? 


Hint: Let Y be the r.v. denoting the lifespan of a light bulb man- 
ufactured by the new process. We do not necessarily assume that 
Y is Normally distributed. If the claim made is correct, then EY = 
2,000 + 10% x 2,000 = 2,200, whereas s.d.(Y ) = 200. A random sample 
from Y produces the sample mean Y, for which EY, = 2,200 (under 
the claim) and Var(Y,,) = 200*/n, and we must determine n, so that 
P(Y, > 2,000) = 0.95. If the new process were the same as the old 
one, then, for all sufficiently large n, P(Y, > 2,000) ~ 0.50. So, if 
P(Y, > 2,000) = 0.95, the claim made would draw support. 


2.22 (i) Consider the ii.d. rv.’s Xi, ..., Xn and Yj,..., Y, with expectation 
u and variance o”, both finite, and let X,, and Y,, be the respective 
sample means. Use the CLT in order to determine the sample size n, 
so that P(|Xn — Yn| < 0.250) = 0.95. 
(ii) Letthe random samples Xj, ..., Xy and Y}, ..., Y, be asin part (i), but 
we do not assume that they are coming from the same distribution. 
We do assume, however, that they have the same mean and the same 
variance o”, both finite. Then determine n as required above by using 
the Tchebichev inequality. 


Hint: Set Z; = X; — Y; and then work as in Exercise 2.20(ii) with the 
Lid. rv.’s Z1,..., Zn. Finally, revert to the X;’s and the Y;’s. 


2.23 Let X;, i= 1,..., n, Y;, i = 1,..., n be independent r.v.'s such that the 
X;,’s are identically distributed with EX; = 11, Var(X;) = o?, both finite, 
and the Y;'s are identically distributed with EY; = uz and Var(Y;) = o°, 
both finite. If X,, and Y, are the respective sample means of the X;’s and 
the Y;’s, then: 

(i) Show that E(X, — Pn) = pı — 12, Var(Xn — Y) = 2. 
(ii) Use the CLT in order to show that A 12123 is asymptotically 
distributed as N(0, 1). 





Hint: Set Z; = X; — Y; and work with the iid. rv.’s Z1, ..., Zn; then 
revert to the X;'s and the Y;'s. 


2.24 An academic department in a university wishes to admit 20 first-year 
graduate students. From past experience, it follows that, on the average, 
40% of the students admitted will, actually, accept the admission offer. It 
may be assumed that acceptance and rejection of admission offers by the 
various students are independent events, and let Y, be the r.v. denoting 
the number of those students, actually, accepting admission. 
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(i) Use the CLT in order to determine n, so that the probability 
P(|Y, — 20| < 2) is maximum. Use a rough geometric argument. 
(ii) Compute the (approximate) probability P(|Y, — 20| < 2) once nis 
determined. 
(iii) Derive in a more rigorous way a relation through which n is to 
be determined subject to the maximization of the probability 
P(Y, — 20| < 2). 


Hint: With each one of the n students offered admission, associate 
a r.v. Xi, which takes on the value 1, if the ¿th student accepts the 
offer, and 0 otherwise. Then the r.v.'s X;,..., Xn are independent, and 
also assume that they have the same distribution; i.e., P(X; = 1) = 
p(= 0.40 here) for all is. Then the X;'s are distributed as B(1, p) and 
Y, = »;_¡ Xi. Then, for part (i), draw the N(0, 1) p.d.f. curve, and 
by symmetry and geometric considerations conclude that seemingly 
(but not precisely) the required probability is minimized for the value 
of n for which 18 — np = —(22 — np). (Do not use any continuity 
correction.) 

For part (iii), pretend that nis a continuous variable, differentiate 
with respect to it, and equate to 0 in order to arrive at the following 
relationship, after some cancellations and modifications: 


80—16n | 04n+22 
024n  °804n+18° 





Remark: The problem may also be posed by replacing 20, 2, and 40% 
by e, d, and 100p%, say. 


[l 7.3 Further Limit Theorems 


THEOREM 5 


Convergence in probability enjoys some of the familiar properties of the usual 
pointwise convergence. One such property is stated below in the form of a 
theorem whose proof is omitted. 








G) For n > 1, let X, and X be r.v.'s such that X,, 5 X, and let g be 
a continuous real-valued function; that is, g : R —> KR continuous. 
Then the r.v.'s g(Xn), n> 1, also converge in probability to g (X); that 


is, g (Xn) = g(X ). More generally: 

(ii) Forn > 1, let Xm Yn, X, and Y berw'ssuchthat Xn > X, Y, —> Y, 
and let g be a continuous real-valued function; that is, g : R? —> NR 
continuous. Then the r.v.'s. g(X,, Y,), n > 1, also converge in 


probability to g(X, Y); that is, g(X», Yn) —> g(X, Y). (This part 


N—>00 
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also generalizes in an obvious manner to k sequences (XW), 
ma IA) 








To this theorem, there is the following important corollary. 


COROLLARY If X, > X and Yp —> Y, then: 


(i) aX, + bY, = aX + bY, where a and b are constants; and, in particular, 


A. Xr 
poe 

(ii) Xn¥n —> XY. 
(iii) an mm z, provided P(Y, 4 0) = P(Y # 0) = 1. 
PROOF Although the proof of the theorem was omitted, the corollary can be 
proved. Indeed, all one has to do is to take: g : R? —> St as follows, respectively, 
for parts (i)—-(iii) and observe that it is continuous: g(x, y) = ax + by (and, in 
particular, g(x, y) = £+ Y); gx, y) = xy; glx, Y)=X/Y, YAO. A 


Actually, a special case of the preceding corollary also holds for conver- 
gence in distribution. Specifically, we have 





(Slutsky) Let X,, = X and let Y, == c, a constant c rather than a 
(proper) r.v. a Then: 

(i) Xn t Yn = X +; (11) X.Y = cX; (iii) € e X provided P(Y, + 
(0) wl and C fa 0. 

In terms of d.f.’s, these convergences are written as follows, always 
as n—oo and for all z e R for which: 2— c is a continuity point of Fx 
for part (i); z/c is a continuity point of Fy for part (ii); cz is a continuity 
point of Fy for part (iii): 


PXXY,+Y, <2) > PX+¢e<2)=P(X <2z-c), or 
ne) = Fee); 
PCs) c= 0 
P(XnYn < 2) > P(cX < z) = OE 
PO 


Pe) Gea) 
1— P(X < *)@1— Fx(®), if Fx is continuous), c <0; 


Xn X P(X <cz), c>=0 
P| — <z] ~P(—<z)= , Or 
Vin G IAO E CA Ga 


eye) | 
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Fx(c2), c>0 


Fi (2) > : . E 
Yn 1 — P(X < cz\(=1 — Fy(cz), if Fy is continuous), c< 0. 











The proof of this theorem, although conceptually not complicated, is, nev- 
ertheless, long and is omitted. Recall, however, that Y, no e if and only if 
Yn ab and this is another way the convergence of Y, is stated. 

As a simple concrete application of Theorem 5, consider the following 
example. 


Suppose X;, 4. xX~N (u, 02), and let Cn, C, dn, and d be constants such that 
Cn > c and d, > d. Then ¢,Xn + dn — Y ~ N(cu + d, Co”). 


DISCUSSION Trivially, cn 2 cand dn = d, so that, by Theorem 6(ii), 
CnXn =" cX, and by Theorem 6(i), CnXn + dn ay cX + d. However, X ~ 
N(u, 0?) implies that cX +d ~ N(cu +d, C20). Thus, ¢nXn+d, —> cX+d = 
Y ~ N(cu +d, co”). 


The following result is an application of Theorems 5 and 6 and is of much 
use in statistical inference. For its formulation, let X4, ..., Xn be iid. r.v.'s 
with finite mean yy and finite and positive variance o”, and let X,, and S? be 
the sample mean and the “adjusted” (in the sense that y is replaced by X.,) 
sample variance (which we have denoted by 5? in relation (13) of Chapter 5); 
that is, X, = 4 WL, Xi, S2 = 2 7 (Ki — Xn. 





Under the assumptions just made and the notation introduced, it holds: 


os 30 (ii) 4 5 7 ~ NO, D. 











PROOF (i) Recall that Y% (X: — X,)2 = YY, X? — nk so that S2 = 
Ly, X? — X2. Since EX? = Var(X;) + oe Y = e es pS the WLLN ap- 
plies lO the iid. rv.’s X?,..., X? and gives: + ¡Lia xX? — 0 + u’. Also, 
Xn = u, by the WLLN again, and then X2 2 = u? by Theorem 5(i). Then, 
by Theorem 56i), 


sx Xx 5 (0 + 12) — pw? =o", 
which is what part (i) asserts. 


(ii) Part (i) and Theorem 5(i) imply that Sn => o, or Sn - = 1. By 
Theorem 4, sei 1) = Z ~ N(0, 1). Then Theorem 6Gii) applies and 
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gives: 


Van — p)/o = VNXn =u) 


d 
S/o S, ¿4 NOD. 4 





REMARK6 Part (ii) ofthe theorem states, in effect, that for sufficiently large 
n, o may be replaced in the CLT by the adjusted sample standard deviation S,, 
and the resulting expression still has a distribution which is close to the N(0, 1) 
distribution. 

The WLLN states that X, = u, which, for a real-valued continuous func- 
tion g, implies that 


(Xn) > gw). (14) 


n—>00 


On the other hand, the CLT states that: 


MX — u) 
O 





d 5 d 

NOD) or Vnan- u) > NO, 0°). 

The question then arises what happens to the distribution of g(X,,). In other 
words, is there a result analogous to (14) when the distribution of g(X,,) is 
involved? The question is answered by the following result. 





Let X,,..., Xn be iid. r.v's with finite mean y and variance o? e (0, oo), 
and let g : R > R be differentiable with derivative g’ continuous at u. 
Then: 


Vnlg Xn) - 90) 2 NO, [og (01?) (15) 


N—>00 








The proof of this result involves the employment of some of the theorems 
established in this chapter, including the CLT, along with a Taylor expansion. 
The proof itself will not be presented, and this section will be concluded with 
an application to Theorem 8. The method of establishing asymptotic normality 
for g(X,) is often referred to as the delta method, and it also applies in cases 
more general than the one described here. 


APPLICATION Let the independent r.v.'s X;,..., Xn be distributed as 
Bad, p). Then: 


VARA- Xn) — pa] 3 NO, pq -2p @=1-p). (16) 
PROOF Here u = p, o? = pq, and g(x) = x(1 — x), so that g'(x) = 1 — 2x 


continuous for all x. Since g(X;) = X,(1 — Xn), g(u) = PA — p) = pa, and 
9 (4) = 1 — 2p, the convergence in (15) becomes as stated in (16). A 
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3.1 Let X,,..., Xn be iid. rv.’s with finite EX; = u, and Var(X;) = 0? e 
(0, co) so that the CLT holds; that is, 


JUXn =u) 
O 





d 5 1 
—2 Z~N(0,1), where X, = a 2 X; 


Then use Theorem 6 in order to show that the WLLN also holds. 









Chapter 8 
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An Overview of 
Statistical Inference 


A review of the previous chapters reveals that the main objectives throughout 
have been those of calculating probabilities or certain summary characteristics 
of a distribution, such as mean, variance, median, and mode. However, for 
these calculations to result in numerical answers, it is a prerequisite that the 
underlying distribution be completely known. Typically, this is rarely, if ever, 
the case. The reason for this is that the parameters which appear, for example, 
in the functional form of the p.d.f. of a distribution are simply unknown to us. 
The only thing known about them is that they lie in specified sets of possible 
values for these parameters, the parameter space. 

It is at this point where statistical inference enters the picture. Roughly 
speaking, the aim of statistical inference is to make certain determinations 
with regard to the unknown constants (parameters) figuring in the underlying 
distribution. This is to be done on the basis of data, represented by the ob- 
served values of a random sample drawn from said distribution. Actually, this 
is the so-called parametric statistical inference as opposed to the nonpara- 
metric statistical inference. The former is applicable to distributions, which 
are completely determined by the knowledge of a finite number of parame- 
ters. The latter applies to distributions not determined by any finite number of 
parameters. 

The remaining part of this book is, essentially, concerned with statistical 
inference and mostly with parametric statistical inference. Within the frame- 
work of parametric statistical inference, there are three main objectives, de- 
pending on what kind of determinations we wish to make with regard to the 
parameters. If the objective is to arrive at a number, by means of the avail- 
able data, as the value of an unknown parameter, then we are talking about 
point estimation. If, on the other hand, we are satisfied with the statement 
that an unknown parameter lies within a known random interval (that is, an 
interval with r.v.’s as its end-points) with high prescribed probability, then we 
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are dealing with ¿interval estimation or confidence intervals. Finally, if the 
objective is to decide that an unknown parameter lies in a specified subset of 
the parameter space, then we are in the area of testing hypotheses. 

These three subjects — point estimation, interval estimation, and testing 
hypotheses — are briefly discussed in the following three sections. In the sub- 
sequent three sections, it is pointed out what the statistical inference issues are 
in specific models — a regression model and two analysis of variance mod- 
els. The final section touches upon some aspects of nonparametric statistical 
inference. 


i 8.1 The Basics of Point Estimation 


The problem here, briefly stated, is as follows. Let X be a r.v. with a p.d.f. f 
which, however, involves a parameter. This is the case, for instance, in the 
Binomial distribution B(1, p), the Poisson distribution P(A), the Negative Ex- 
ponential f(x) = Ae”, x > 0 distribution, the Uniform distribution U(0, œ), 
and the Normal distribution N(u, 0?) with one of the quantities u and o? 
known. The parameter is usually denoted by 0, and the set of its possible val- 
ues is denoted by Q and is called the parameter space. In order to emphasize 
the fact that the p.d.f. depends on 0, we write f(-; 0). Thus, in the distributions 
mentioned above, we have for the respective p.d.f.’s and the parameter spaces: 


f(30)=0%(1-0% x=01, G€Q2=(0,1). 


The situations described in Examples 5, 6, 8, 9, and 10 of Chapter 1 may be 
described by a Binomial distribution. 


—0 gx 


9 
$050 = E x=0,1,...., 0€2=(0,0). 





The Poisson distribution can be used appropriately in the case described in 
Example 12 of Chapter 1. 


f(x; 0) =0e", x>0, 6€2=(0, 00). 


5, 0O<x<0 
Su; 0) = ~ 9EN=(0,00). 
0, otherwise, 











1 (a0? 
x; 0) = em, wen, @€Q2=R, 0 known, 
F(a; 0) Bee 
and 
i (0? 
x;0)= e 2%, xeh, 0e2=(0,00), known. 
AD (0,00), u 


Normal distributions are suitable for modeling the situations described in 
Examples 16 and 17 of Chapter 1. 

Our objective is to draw a random sample of size n, X1, ..., Xn, from the 
underlying distribution, and on the basis of it to construct a point estimate 
(or estimator) for 0, that is, a statistic ô = 0(X,,..., Xn), which is used for 
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estimating 0, where a statistic is a known function of the random sample 
Xi,..., Xn. Ifa, ..., &n are the actually observed values ofther.v.'s X1, ..., Xn, 
respectively, then the observed value of our estimate has the numerical value 
0(x%1, ..., £n). The observed values z, ..., £n are also referred to as data. Then, 
on the basis of the available data, itis declared that the value of0isÓ(x,, ..., Ln) 
from among all possible points in Q. A point estimate is often referred to just as 
an estimate, and the notation @ is used indiscriminately, both for the estimate 
6(X1, ..., Xn) (which is a r.v.) and for its observed value 6(a1, ..., £n) (which 
is just a number). 

The only obvious restriction on 6(%, ..., %n) is that it lies in Q for all 
possible values of X4, ..., Xn. Apart from it, there is any number of estimates 
one may construct — thus, the need to assume certain principles and/or invent 
methods for constructing 6. Perhaps, the most widely accepted principle is 
the so-called principle of Maximum Likelihood (ML). This principle dictates 
that we form the joint p.d.f. of the x;’s, for the observed values of the X;’s, 
look at this joint p.d.f. as a function of 6 (and call it the likelihood function), 
and maximize the likelihood function with respect to 0. The maximizing point 
(assuming it exists and is unique) is a function of 7%, ..., Xy, and is what we 
call the Maximum Likelihood Estimate (MLE) of 6. The notation used for the 
likelihood function is L(@ | %, ..., Xn). Then, we have that: 


L(0|%,..., En) = f(A1,0)-*- fans 0), OER. 


The MLE will be studied fairly extensively in Chapter 9. 

Another principle often used in constructing an estimate for 0 is the prin- 
ciple of wnbiasedness. In this context, an estimate is usually denoted by 
U = U(X], ..., Xn). Then the principle of unbiasedness dictates that U should 
be constructed so as to be unbiased; that is, its expectation (mean value) 
should always be 9, no matter what the value of 6 in Q. More formally, E¿U = 6 
for all 6 e Q. (In the expectation sign E, the parameter 0 was inserted to indi- 
cate that this expectation does depend on 0, since it is calculated by using the 
p.d.f. fC; 0).) Now, it is intuitively clear that, in comparing two unbiased esti- 
mates, one would pick the one with the smaller variance, since it would be more 
closely concentrated around its mean 6. Envision the case that, within the class 
of all unbiased estimates, there exists one which has the smallest variance (and 
that is true for all 9 € Q). Such an estimate is called a Uniformly Minimum 
Variance Unbiased (UMVU) estimate and is, clearly, a desirable estimate. In 
the next chapter, we will see how we go about constructing such estimates. 

The principle (or rather the method) based on sample moments is another 
way of constructing estimates. The method of moments, in the simplest case, 
dictates to form the sample mean X and equate it with the (theoretical) mean 
E¿X. Then solve for 6 (assuming it can be done, and, indeed, uniquely) in order 
to arrive at a moment estimate of 0. 

A much more sophisticated method of constructing estimates of 0 is the 
so-called decision-theoretic method. This method calls for the introduction of 
a host of concepts, terminology, and notation, and it will be taken up in the 
next chapter. 
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Finally, another relatively popular method (in particular, in the context of 
certain models) is the method of Least Squares (LS). The method of LS leads 
to the construction of an estimate for 0, the Least Squares Estimate (LSE) of 
6, through a minimization (with respect to 0) of the sum of certain squares. 
This sum of squares represents squared deviations between what we actually 
observe after experimentation is completed and what we would expect to have 
on the basis of an assumed model. Once again, details will be presented later 
on, more specifically, in Chapter 13. 

In all of the preceding discussion, it was assumed that the underlying p.d.f. 
depended on a single parameter, which was denoted by 6. It may very well be 
the case that there are two or more parameters involved. This may happen, for 
instance, in the Uniform distribution U(a, $), —oo <a < B < oo, where both a 
and $ are unknown; the Normal distribution, N(u, 07), where both y and o? 
are unknown; and it does happen in the Multinomial distribution, where the 
number of parameters is k, pı, ..., p (or more precisely, k — 1, since the kth 
parameter, for example, p, = 1 — pı — --- — Ppx-1). For instance, Examples 20 
and 21 of Chapter 1 refer to situations where a Multinomial distribution is ap- 
propriate. In such multiparameter cases, one simply applies to each parameter 
separately what was said above for a single parameter. The alternative option 
to use the vector notation for the parameters involved does simplify things in 
a certain way but also introduces some complications in other ways. 


i 8.2 The Basics of Interval Estimation 


Suppose we are interested in constructing a point estimate of the mean n in 
the Normal distribution N(u, 0?) with known variance; this is to be done on 
the basis of a random sample of size n, Xj, ..., Xn, drawn from the underlying 
distribution. This amounts to constructing a suitable statistic of the X;’s, call 
it V = V(X, ..., Xn), which for the observed values x; of X;, i = 1,..., nis 
a numerical entity, and declare it to be the (unknown) value of u. This looks 
somewhat presumptuous, since from the set of possible values for u, —oo < 
u < oo, just one is selected as its value. Thinking along these lines, it might 
be more reasonable to aim instead at a random interval which will contain the 
(unknown) value of u with high (prescribed) probability. This is exactly what 
a confidence interval does. 

To be more precise and in casting the problem in a general setting, let 
X1,..., Xn be a random sample from the p.d.f. f(-; 6), 0 € Q C R, and let 
L = L(X,..., Xn) and U = U(X, ..., Xn) be two statistics of the X;’s such 
that L < U. Then the interval with end-points L and U, [L, U], is called a 
random interval. Let a be a small number in (0, 1), such as 0.005, 0.01, 0.05, 
and suppose that the random interval [L, U] contains 6 with probability equal 
to 1 — @ (such as 0.995, 0.99, 0.95) no matter what the true value of 0 in Q is. 
In other words, suppose that: 


PAL<0<U)=1-4 forall0 es. (1) 
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If relation (1) holds, then we say that the random interval [L, U] is a confidence 
interval for 0 with confidence coefficient 1 — a. 

The interpretation of the significance of a confidence interval is based on 
the relative frequency interpretation of the concept of probability, and it goes 
like this: Suppose n independent r.v.'s are drawn from the p.d.f. f(-; 0), and let 
X1, ..., Ln be their observed values. Also, let [L;, U1] be the interval resulting 
from the observed values of L = L(X;,..., Xn) and U = U(X;,..., Xn); that 
is, Lı = L(x%,,..., £n) and Ui = U(x;,..., Xn). Proceed to draw independently 
a second set of n r.v.'s as above, and let [L>, U2] be the resulting interval. 
Repeat this process independently a large number of times, N, say, with the 
corresponding interval being [Ly, Uy]. Then the interpretation of (1) is that, 
on the average, about 100(1 — a)% of the above N intervals will, actually, 
contain the true value of 6. For example, for a = 0.05 and N = 1,000, the 
proportion of such intervals will be 95%; that is, one would expect 950 out 
of the 1,000 intervals constructed as above to contain the true value of 6. 
Empirical evidence shows that such an expectation is valid. 

We may also define an upper confidence limit for 0, U = U(X), ..., Xn), 
and a lower confidence limit for 0, L = L(X;,..., Xn), both with confidence 
coefficient 1 — a, if, respectively, the intervals (—oo, U] and [L, oo) are confi- 
dence intervals for 9 with confidence coefficient 1 — a. That is to say: 


P(-œ <9 <U)=1-a, Pj(L<6<oo)=1-a foralloeQ. (2) 


Confidence intervals and upper and/or lower confidence limits can be sought, 
for instance, in Examples 5, 6, 8, 9, and 10 (Binomial distribution), 12 (Poisson 
distribution), and 16 and 17 (Normal distribution) in Chapter 1. 

There are some variations of (1) and (2). For example, when the underlying 
p.d.f. is discrete, then equalities in (1) and (2) rarely obtain for given a and 
have to be replaced by inequalities >. Also, except for special cases, equalities 
in (1) and (2) are valid only approximately for large values of the sample size 
n (even in cases where the underlying r.v.'s are continuous). In such cases, we 
say that the respective confidence intervals (confidence limits) have confidence 
coefficient approximately 1 — a. 

Finally, the parameters of interest may be two (or more) rather than one, 
as we assumed so far. In such cases, the concept of a confidence interval is 
replaced by that of a confidence region (in the multidimensional parameter 
space (2). This concept will be illustrated by an example in Chapter 10. In the 
same chapter, we will also expand considerably on what was briefly discussed 
here. 
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Often, we are not interested in a point estimate of a parameter 9 or even a 
confidence interval for it, but rather whether said parameter lies or does not 
lie in a specified subset w of the parameter space Q. To clarify this point, we 
refer to some of the examples described in Chapter 1. Thus, in Example 5, all 
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we might be interested in is whether Jones has ESP at all or not and not to 
what degree he does. In statistical terms, this amounts to taking nindependent 
observations from a B(1, 9) distribution and, on the basis of these observations, 
deciding whether 0 € œw = (0, 0.5] (as opposed to 6 € w° = (0.5, 1)); here 6 
is the probability that Jones correctly identifies the picture. The situation in 
Example 6 is similar, and the objective might be to decide whether or not 
0 € œw = (4, 1); here @ is the true proportion of unemployed workers and 6, is 
a certain desirable or guessed value of 9. Examples 8, 9, and 10 in Chapter 1 
fall into the same category. 

In Example 12, the stipulated model is a Poisson distribution P(@) and, on 
the basis of n independent observations, we might wish to decide whether or 
not 6 € (6, co), where 6) is a known value of 6. 

In Example 16, the stipulated underlying models may be Normal distribu- 
tions N(u1, 0?) and N(u2, o”) for the survival times X and Y, respectively, and 
then the question of interest may be to decide whether or not u2 < u1; o? may 
be assumed to be either known or unknown. Of course, we are going to arrive 
at the desirable decision on the basis of two independent random samples 
drawn from the underlying distributions. Example 17 is of the same type. 

In Example 20, the statistical problem is that of comparing two Multino- 
mial populations, by making appropriate statements about the probabilities 
PAE, PAA, Pap and Ppr, Ppa, Pap; here pag is the probability that any one of the 
80 infants, subjected to diet A, is of “excellent” health, and similarly for the 
remaining probabilities. Example 21 is of a similar type. 

On the basis of the preceding discussion and examples, we may now pro- 
ceed with the formulation of the general problem. To this effect, let X4, ..., Xn 
be iid. rv.’s with p.d.f. fC; 0), @ € Q C R",r > 1, and by means of this ran- 
dom sample, suppose we are interested in checking whether 0 € w, a proper 
subset of Q, or 0 € w%, the complement of w with respect to Q. The statements 
that 0 € w and 0 e o” are called (statistical) hypotheses (about 0), and are 
denoted thus: Ho : 0 € œw, Ha : 0 € w*. The hypothesis Hp is called a null hy- 
pothesis and the hypothesis H4 is called alternative (to Ho) hypothesis. The 
hypotheses Ho and Hy are called simple, if they contain a single point, and 
composite otherwise. The procedure of checking whether A, is true or not, on 
the basis of the observed values 2%, ..., £n of X1,..., Xn, is called testing the 
hypothesis Ho against the alternative Ha. 

In the special case that Q C R, some null hypotheses and the respective 
alternatives are as follows: 


Ho : 0 = Qo against H1:0> 00; Ho : 0 = 0o against H1:0=< 00; 
Ho: 0 < 0y against H; : 0 > 6;  Ho:0 > 6 against Ha : 0 < 4; 
Ho : 0 = against H1:0 A Oo. 


The testing is carried out by means of a function y : R” — [0, 1] which is 
called a test function or just a test. The number g(a, ..., Xy) represents the 
probability of rejecting Ho, given that X; = æ, i = 1, ..., n. Inits simplest form, 
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gistheindicator ofaset Bin R”, whichis called the critical or rejection region; 
its complement B* is called the acceptance region. Thus, p(X1,..., &n) = 1 if 
Xi, --., Ln are in B, and y(x,,..., &n) = 0, otherwise. Actually, such a test is 
called anonrandomized test as opposed to tests which also take values strictly 
between 0 and 1 and are called randomized tests. In the case of continuous 
distributions, nonrandomized tests suffice, but in discrete distributions, a test 
will typically be required to take on one or two values strictly between 0 and 1. 

By using a test y, suppose that our data x;, ..., &n lead us to the rejection of 
A. This will happen, for instance, if the test y is nonrandomized with rejection 
region B, and the x;'s lie in B. By rejecting the hypothesis Ho, we may be doing 
the correct thing, because Hp is false (that is, O ¢ œ). On the other hand, we 
may be taking the wrong action because it may happen that Hp is, indeed, true 
(that is, O € œw), only the test and the data do not reveal it. Clearly, in so doing, 
we commit an error, which is referred to as type I error. Of course, we would 
like to find ways of minimizing the frequency of committing this error. To put it 
more mathematically, this means searching for a rejection region B, which will 
minimize the above frequency. In our framework, frequencies are measured 
by probabilities, and this leads to a determination of B so that 


P(of type I error) = P(of rejecting Hy whereas Hp is true) 
= P,(Xj,..., Xn lie in B whereas 0 € œ) 
= P,(X),..., X,liein B|0 € œ) E a(0) is minimum. (3) 


Clearly, the probabilities w(@) in (3) must be minimized for each 0 € w, since 
we don't know which value in w is the true 6. This will happen if we minimize 
the maxge., a (0) = a. This maximum probability of type I error is called 
the level of significance of the test employed. Thus, we are led to selecting the 
rejection region B so that its level of significance a will be minimum. Since 
a > 0, its minimum value would be 0, and this would happen if (essentially) 
B = Ø. But then (essentially) the x;'s would always be in B° = St”, and this 
would happen with probability 


Po(X1,..., X in RK") =1 forall 6. (4) 


This, however, creates a problem for the following reason. If the rejection 
region Bis Ø, then the acceptance region is R”; that is, we always accept Ho. 
As long as Ho is true (that is, O € œ), this is exactly what we wish to do, but 
what about the case that Ho is false (that is, O € w°)? When we accept a false 
hypothesis Ho, we commit an error, which is called the type IT error. As in (3), 
this error is also measured in terms of probabilities; namely, 


P(of type Il error) = P(of accepting Hp whereas Ho is false) 
= P(X, ..., Xn lie in B° whereas 0 € a) 
= Po(X,..., Xn lie in B°|O € w°) 
= 60). (5) 


234 


Chapter 8 An Overview of Statistical Inference 


According to (5), these probabilities would be 1 for all 8 € v* (actually, 
for all O e Q), if B = Ø. Clearly, this is undesirable. The preceding discussion 
then leads to the conclusion that the rejection region B must be different from 
Ø and then a will be > 0. The objective then becomes that of choosing B so 
that a will have a preassigned acceptable value (such as 0.005, 0.01, 0.05) and, 
subject to this restriction, the probabilities of type II error are minimized. That 
is, 


B(0) = Po(X, ..., Xn lie in B°) is minimum for each O e o°. (6) 


Since P(X, ..., Xn lie in B°) = 1 — Pg(X;,..., Xn lie in B), the minimization 
in (6) is equivalent to the maximization of 


Po(X1,..., Xn lie in B) = 1 — Po(X,..., X,liein B°) forall@ e a”. 
The function 2(@), 0 € w°, defined by: 
zx(0) = Po(Xj, ..., Xn lie in B), Oco, (7) 


is called the power of the test employed. So, power of a test = 1— probability 
of a type II error, and we may summarize our objective as follows: Choose atest 
with a preassigned level of significance a, which has maximum power among 
all tests with level of significance < a. In other words, if y is the desirable test, 
then it should satisfy the requirements: 


The level of significance of ¢ is a, and its power, to be denoted by 7,(8), 0 € œ, 
satisfies the inequality 2,(@) > z,+(@) for all O € w° and any test y* with level 
of significance < a. 


Such a test y, should it exist, is called Uniformly Most Powerful (UMP) for 
obvious reasons. (The term “most powerful” is explained by the inequality 
To(0) > Ty-(0), and the term “uniformly” is due to the fact that this inequality 
must hold for all 0 € w*.) If œ consists of a single point, then the concept of 
uniformity is void, and we talk simply of a Most Powerful (MP) test. 

The concepts introduced so far hold for a parameter of any dimensionality. 
However, UMPtests can be constructed only when 0 is areal-valued parameter, 
and then only for certain forms of Hp and Ha and specific p.d.f.'s f(-; 0). If the 
parameter is multidimensional, desirable tests can still be constructed; they 
are not going to be, in general, UMP tests, but they are derived, nevertheless, 
on the basis of principles which are intuitively satisfactory. Preeminent among 
such tests are the so-called Likelihood Ratio (LR) tests. Another class of tests 
are the so-called goodness-of-fit tests, and still others are constructed on the 
basis of decision-theoretic concepts. Some of the tests mentioned above will 
be discussed more extensively in Chapters 11 and 12. Here, we conclude this 
section with the introduction of a LR test. 

On the basis of the random sample Xj, ..., Xn with p.d.f. f(-; 0), 0 € AC 
R”, r > 1, suppose we wish to test the hypothesis Hp : O € œ (a proper) subset 
of Q. It is understood that the alternative is H4 : O € w*, but in the present 
framework it is not explicitly stated. Let x,,..., Y, be the observed values 
of X,,..., Xn and form the likelihood function L(0) = L(@|%,...,%,) = 
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Tli- F(%; 0). Maximize L(@) and denote the resulting maximum by LQ). 
This maximization happens when @ is equal to the MLE ô = Ox, .--, Ln), SO 
that L(Q) = L(@). Next, maximize the likelihood L(@) under the restriction 
that O € w, and denote the resulting maximum by L(@). Denote by 6,, the MLE 
of O subject to the restriction that 9 € w. Then L(@) = L(@,,). Assume now 
that L(@) is continuous (in 0), and suppose that the true value of 0, call it 
Oo, is in œ. It is a property of a MLE that it gets closer and closer to the true 
parameter as the sample size n increases. Under the assumption that 09 € a, 
it follows that both Ó and ô, will be close to @ and therefore close to each 
other. Then, by the assumed continuity of L(@), the quantities L(@) and L(0.,) 
are close together, so that the ratio 


A@A, «+, En) = 4 = L(0,)/L(0) (8) 


(which is always <1) is close to 1. On the other hand, if 09 € w°, then 6 and 6,, 
are not close together, and therefore LÊ) and LÊ») need not be close either. 
Thus, the ratio L(0,,)/L(Ó) need not be close to 1. These considerations lend 
to the following test: 


Reject Hp when à < Ao, where Ao is a constant to be determined. (9) 


By the monotonicity of the function y = log x, the inequality à < Ap is equiva- 
lent to —2logA(Xy, ..., Xn) > C(= —2 log 40). It is seen in Chapter 11 that an 
approximate determination of C is made by the fact that, under certain condi- 
tions, the distribution of —2logA(Xj,..., Xn) is X5» where f = dimension of 
Q — dimension of w. Namely: 


Reject Ho when —2logà > C, where C ~ x}. w- (10) 


In closing this section, it is to be mentioned that the concept of P-value is 
another way of looking at a test in an effort to assess how strong (or weak) 
the rejection of a hypothesis is. The P-valwe (probability value) of a test is 
defined to be the smallest probability at which the hypothesis tested would 
be rejected for the data at hand. Roughly put, the P-value of a test is the 
probability, calculated under the null hypothesis, when the observed value of 
the test statistic is used as if it were the cut-off point of the test. The P-value of 
a test often accompanies a null hypothesis which is rejected, as an indication 
of the strength or weakness of rejection. The smaller the P-value, the stronger 
the rejection of the null hypothesis, and vice versa. More about it in Chapter 11. 
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In the last three sections, we discussed the general principles of point esti- 
mation, interval estimation, and testing hypotheses in a general setup. These 
principles apply, in particular, in specific models. Two such models are Regres- 
sion models and Analysis of Variance models. 

Aregression model arises in situations such as those described in Examples 
22 and 23 in Chapter 1. Its simplest form is as follows: At fixed points 7%, ..., Ln, 
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respective measurements 4, ..., Yn are taken, which may be subject to an as- 
sortment of random errors €1, ..., €n. Thus, the y;’s are values of r.v.’s Y;’s, 
which may often be assumed to have the structure: Y; = 6; + Bo; + ei, i = 
1,..., n; here 6, and fz are parameters (unknown constants) of the model. For 
the random errors e;, it is not unreasonable to assume that Ke; = 0; we also 
assume that they have the same variance, Var (e;) = o? e (0, 00). Furthermore, 
it is reasonable to assume that the e;’s are i.i.d. r.v.'s, which implies indepen- 


dence of the r.v.'s Yi, ..., Yn. It should be noted, however, that the Y;’s are not 
identically distributed, since, for instance, they have different expectations: 
EY; = bı + bxi, i = 1,..., n. Putting these assumptions together, we arrive 


at the following simple linear regression model. 
Y, = Bi + Pox; +e;, the e;’s are iid. with Ee; = 0 and 
Var (e) =0°, i=l,...,n. (11) 


The quantities £1, 2, and o? are the parameters of the model; the Y;'s are in- 
dependent but not identically distributed; also, EY; = 6, + Box; and Var (Y;) = 
o? i=1,..., n. 

The term “regression” derives from the way the Y;’s are produced from the 
xs, and the term “linear” indicates that the parameters fı and f2 enter into 
the model raised to the first power. 

The main problems in connection with model (11) are to estimate the pa- 
rameters $81, fa, and o?; construct confidence intervals for 6; and £z; test hy- 
potheses about $; and £2; and predict the expected value EY;, (or the value 
itself Y;,) corresponding to an x;,, distinct, in general, from x1,..., Yn. Esti- 
mates of $; and f2, the Least Squares Estimates (LSE's), can be constructed 
without any further assumptions; the same for an estimate of o?. For the 
remaining parts, however, there is a need to stipulate a distribution for the 
e¡'s. Since the e;’s are random errors, it is reasonable to assume that they are 
Normally distributed; this then implies Normal distribution for the Y;¿'s. Thus, 
model (11) now becomes: 


Y; = Bı + Box; +e;, the e;'s are independently 
distributed as N(0,0%), i=1,...,n. (12) 


Under model (12), the MLE’s of 81, 62, and o? are derived, and their distri- 
butions are determined. This allows us to pursue the resolution of the parts of 
constructing confidence intervals, testing hypotheses, and of prediction. The 
relevant discussion is presented in Chapter 13. 


i 8.5 The Basics of Analysis of Variance 


Analysis of Variance (ANOVA) is a powerful technique, which provides the 
means of assessing and/or comparing several entities. ANOVA can be used 
effectively in many situations; in particular, it can be used in assessing and/or 
comparing crop yields corresponding to different soil treatments; crop yields 
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corresponding to different soils and fertilizers; for the comparison of a certain 
brand of gasoline with or without an additive by using it in several cars; the 
comparison of different brands of gasoline by using them in several cars; the 
comparison of the wearing of different materials; the comparison of the effect 
of different types of oil on the wear of several piston rings, etc.; the comparison 
of the yields of a chemical substance by using different catalytic methods; 
the comparison of the strengths of certain objects made of different batches 
of some material; the comparison of test scores from different schools and 
different teachers, etc.; and identification of the melting point of a metal by 
using different thermometers. Example 24 in Chapter 1 provides another case 
where ANOVA techniques are appropriate. 

Assessment and comparisons are done by way of point estimation, interval 
estimation, and testing hypotheses, as these techniques apply to the specific 
ANOVA models to be considered. The more factors involved in producing an 
outcome, the more complicated the model becomes. However, the basic ideas 
remain the same throughout. 

For the sake of illustrating the issues involved, consider the so-called one- 
way layout or one-way classification model. Consider one kind of gasoline, 
for example, unleaded regular gasoline, and suppose we supply ourselves with 
amounts of this gasoline, purchased from / different companies. The objective 
is to compare these / brands of gasoline from yield viewpoint. To this end, a 
car (or several but pretty similar cars) operates under each one of the J brands 
of gasoline for J runs in each case. Let Y;; be the number of miles per hour 
for the jth run when the ¿th brand of gasoline is used. Then the Y;;’s are r.v.'s 
for which the following structure is assumed: For a given i, the actual number 
of miles per hour for the jth run varies around a mean value u, and these 
variations are due to an assortment of random errors e;;. In other words, it 
makes sense to assume that Yj; = mi + e;;. It is also reasonable to assume 
that the random errors e;; are independent r.v.'s distributed as N(0, a”), some 
unknown variance o”. Thus, we have stipulated the following model: 


Yij = Mi + ey, where the e;;'s are independently 
=N(0,0%), i=1,...,1(=2), j=1,..., J(>2). (13) 
The quantities u;, i = 1, ..., J, and o? are the parameters of the model. 


It follows that the r.v.’s Y;; are independent and Yj; ~ N(ui, o)j = 
Ly ong hy CS aged 

The issues of interest here are those of estimating the u;’s (mean number 
of miles per hour for the ith brand of gasoline) and o?. Also, we wish to test the 
hypothesis that there is really no difference between these J different brands of 
gasoline; in other words, test Ho : yı = - -- = ur(= y, Say, unknown). Should 
this hypothesis be rejected, we would wish to identify the brands of gasoline 
which cause the rejection. This can be done by constructing a confidence 
interval for certain linear combinations of the u;’s called contrasts. That is, 
os Cihi, Where C;,..., Cr are constants with T ci = 0. 

Instead of having one factor (gasoline brand) affecting the outcome (num- 
ber of miles per hour), there may be two (or more) such factors. For example, 
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there might be some chemical additives meant to enhance the mileage. In this 
framework, suppose there are J such chemical additives, and let us combine 
each one of the J brands of gasoline with each one of the J chemical additives. 
For simplicity, suppose we take just one observation, Y;;, on each one of the 
IJ pairs. Then it makes sense to assume that the rv. Y;; is the result of the 
following additive components: A basic quantity (grand mean) u, the same 
for all ¿and J; an effect a; due to the ith brand of gasoline (the ith row effect); 
an effect P; due to the jth chemical additive (the ith column effect); and, of 
course, the random error e;; due to a host of causes. So, the assumed model 
is then: Yj; = u + a; + Bj + eij. As usually, we assume that the e;;’s are inde- 
pendent ~ N(0, o?) with some (unknown) variance o?, which implies that the 
Y;;'s are independent r.v.’s and Yy; ~ N(u + a; + B;, o°). We further assume 
that some of a; effects are > 0, some are < 0, and on the whole ey a; = 0; 
and likewise for the £; effects: ae 1 Bj = 0. Summarizing these assumptions, 
we have then: 


Yi; = u + œi + j + eij, Where the e;;'s are independently 


~ N(0, 07), i=1,..., 12), j=1,...,J@2), 
I J 
Yai = 0, bare! (14) 
i=l j=l 
The quantities u, œ; i = 1,..., I, Bj, j = 1,..., J and o? are the parameters 
of the model. 


As already mentioned, the implication is that the r.v.'s Yi; are independent 
and Yi; ~ N(w+ a+ 6,07), ¿4=1,...,L j=1,...,J. 

The model described by (14) is called two-way layout or two-way classi- 
fication, as the observations are affected by two factors. 

The main statistical issues are those of estimating the parameters involved 
and testing irrelevance of either one of the factors involved — that is, testing 
Hoa : a, =--- = ær = 0, Hog : b1 =--- = Bs = 0. Details will be presented in 
Chapter 14. There, an explanation of the term “ANOVA” will also be given. 


| 8.6 The Basics of Nonparametric Inference 


All of the problems discussed in the previous sections may be summarized as 
follows: On the basis of a random sample of size n, X1, ..., Xn, drawn from the 
p.d.f. fC; 6), 9 € Q C R, construct a point estimate for 6, a confidence interval 
for 0, and test hypotheses about 0. In other words, the problems discussed were 
those of making (statistical) inference about 6. These problems are suitably 
modified for a multidimensional parameter. The fundamental assumption in 
this framework is that the functional form of the p.d.f. f(-; 0) is known; the 
only thing which does not render f(-; 06) completely known is the presence of 
the (unknown constant) parameter 0. 

In many situations, stipulating a functional form for £(-; 0) either is dictated 
by circumstances or is the product of accumulated experience. In the absence 
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of these, we must still proceed with the problems of estimating important 
quantities, either by points or by intervals, and testing hypotheses about them. 
However, the framework now is nonparametric, and the relevant inference is 
referred to as nonparametric inference. 

Actually, there have been at least three cases so far where nonparametric 
estimation was made without referring to it as such. Indeed, if Xj, ..., Xn are 
Lid. rv.’s with unknown mean j, then the sample mean X, may be taken as 
an estimate of n, regardless of what the underlying distribution of the X;'s is. 
This estimate is recommended on the basis of at least three considerations. 
First, it is unbiased, EX, = no matter what the underlying distribytion is; 
second, X, is the moment estimate of u; and third, by the WLLN, Xn 2, |, SO 
that X,, is close to yu, in the sense of probability, for all sufficiently large n. Now 
suppose that the X;’s also have (an unknown) variance o? e (0, 00). Then the 
sample variance S? can be used as an estimate of o?, because it is unbiased 
(Section 8.1) and also S} =>, o”. Furthermore, by combining X,, and S? and 
using Theorem 7(ii) in Chapter 7, we have that ./n(X, — u)/Sn ~ N(0, 1) for 
large n. Then, for such n, [Xn — 2/2 E Xm + Za/2 > ] is a confidence interval for 
with confidence coefficient approximately 1 — a. 

Also, the (unknown) d.f. F of the X;’s has been estimated at every point 
x € R by the empirical d.f. F, (see relation (1) in Chapter 7). The estimate Fn 
has at least two desirable properties. For all x € K and regardless of the form 
of the d.f. F: EF, (x) = F(x) and Fræ) ¿32 F). 

What has not been done so far is to estimate the p.d.f. f(x) at each x e K, 
under certain regularity conditions, which do not include postulation of a 
functional form for f. There are several ways of doing this; in Chapter 15, we 
are going to adopt the so-called kernel method of estimating f. Some desirable 
results of the proposed estimate will be stated without proofs. 

Regarding testing hypotheses, the problems to be addressed in Chapter 15 
will be to test the hypothesis that the (unknown) d.f. F is, actually, equal to a 
known one Fo; that is Ho : F = Fo, the alternative Ha being that F(x) 4 Fo(x) 
for at least one x e Nt. Actually, from a practical viewpoint, it is more important 
to compare two (unknown) d.f.'s F and G, by stipulating Hp : F = G. The alter- 
native can be any one of the following: H4: F 4G, H}: F > G, Hi: F <G, 
in the sense that F(x) > G(x) or F(x) < G(x), respectively, for all x e N, and 
strict inequality for at least one x. In carrying out the appropriate tests, one 
has to use some pretty sophisticated asymptotic results regarding empirical 
d.f.'s. An alternative approach to using empirical d.f.’s is to employ the concept 
of a rank test or the concept of a sign test. These things will be discussed 
to some extent in Chapter 15. That chapter is concluded with the basics of 
regression estimation but in a nonparametric framework. In such a situation, 
what is estimated is an entire function rather than a few parameters. Some 
basic results are stated in Chapter 15. 
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Point Estimation 


In the previous chapter, the basic terminology and concepts of parametric 
point estimation were introduced briefly. In the present chapter, we are going 
to elaborate extensively on this matter. For brevity, we will use the term es- 
timation rather than parametric point estimation. The methods of estimation 
to be discussed here are those listed in the first section of the previous chap- 
ter; namely, maximum likelihood estimation, estimation through the concepts 
of unbiasedness and minimum variance (which lead to uniformly minimum 
variance estimates), estimation based on decision-theoretic concepts, and es- 
timation by the method of moments. The method of estimation by way of the 
principle of least squares is commonly used in the so-called linear models. 
Accordingly, it is deferred to Chapter 13. 

Before we embark on the mathematical derivations, it is imperative to keep 
in mind the big picture; namely, why do we do what we do? A brief description 
is as follows. Let X be a r.v. with p.d.f. f(; 0), where 6 is a parameter lying 
in a parameter space Q. It is assumed that the functional form of the p.d.f. is 
completely known. So, if 0 were known, then the p.d.f. would be known, and 
consequently we could calculate, in principle, all probabilities related to X, the 
expectation of X, its variance, etc. The problem, however, is that most often in 
practice (and in the present context) 6 is not known. Then the objective is to 
estimate 0 on the basis of a random sample of size n from f(-; 0), X1,..., Xp. 
Then, replacing 6 in f(; 0) by a “good” estimate of it, one would expect to 
be able to use the resulting p.d.f. for the purposes described above to a satis- 
factory degree. 


i 9.1 Maximum Likelihood Estimation: Motivation and Examples 


The following simple example is meant to shed light to the intuitive, yet quite 
logical, principle of Maximum Likelihood Estimation. 
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Let X1, ..., Xio be iid. r.v.'s from the B(1, 9) distribution, 0 < 0 < 1, and let 
%,...,%0 be the respective observed values. For convenience, set t= 
21 +---+ % 0. Further, suppose that in the 10 trials, 6 resulted in successes, 
so that t = 6. Then the likelihood function involved is: L(@ | x) = 6°(1 — 6)*, 
0<0<1l x= (%,...,%0). Thus, L(@ | x) is the probability of observing ex- 
actly 6 successes in 10 independent Binomial trials, the successes occurring 
on those trials for which x; = 1, i = 1, ..., 10; this probability is a function of 
the (unknown) parameter 6. Let us calculate the values of this probability for 
0 ranging from 0.1 to 0.9. We find: 


Values of 9 Values of L(9 | x) 


0.1 0.000006656 
0.2 0.000026200 
0.3 0.000175000 
0.4 0.000531000 
0.5 0.000976000 
0.6 0.003320000 
0.7 0.003010000 
0.8 0.000419000 
0.9 0.000053000 


We observe that the values of L(@ | x) keep increasing, it attains its maximum 
value at 6 = 0.6, and then the values keep decreasing. Thus, if these 9 values 
were the only possible values for 9 (which they are not!), one would reason- 
ably enough choose the value of 0.6 as the value of 9. The value 9 = 0.6 has 
the distinction of maximizing (among the 9 values listed) the probability of 
attaining the 6 already observed successes. 


We observe that 0.6 = i = L, where n is the number of trials and t is the 
number of successes. It will be seen in Example 2 that the value E, actually, 
maximizes the likelihood function among all values of 9 with 0 < 6 < 1. Then 
Ł will be the Maximum Likehood Estimate of 6 to be denoted by 4; i.e., ô = E, 
In a general setting, let X4, ..., Xn be iid. r.v.'s with p.d.f. fC; 0) with 0 € Q, 
and let %4, ..., £n be the respective observed values and x = (%, ..., £n). The 
likelihood function, L(@ | x), is given by L(0 | x) = [];_, f(a; 0), and a value 
of 0 which maximizes L(@ | x) is called a Maximum Likelihood Estimate 
(MLE) of 6. Clearly, the MLE depends on x, and we usually write Ó = Ó(x). 
Thus, 


L(Ô | x) = max{L(0 | x); 0 € Q}. (1) 


The justification for choosing an estimate as the value of the parameter which 
maximizes the likelihood function is the same as that given in Example 1, 
when the r.v.'s are discrete. The same interpretation holds true for r.v.'s of 
the continuous type, by considering small intervals around the observed 
values. 
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Once we decide to adopt the Maximum Likelihood Principle (i.e., the prin- 
ciple of choosing an estimate of the parameter through the process of maxi- 
mizing the likelihood function), the actual identification of an MLE is a purely 
mathematical problem; namely, that of maximizing a function. This maximiza- 
tion, if possible at all, often (but not always) is done through differentiation. 
Examples to be discussed below will illustrate various points. 


Before embarking on specific examples, it must be stressed that, whenever 
a maximum is sought by differentiation, the second-order derivative(s) must 
also be examined in search of a maximum. Also, it should be mentioned that 
maximization of the likelihood function, which is the product of n factors, is 
equivalent to maximization of its logarithm (always with base e), which is the 
sum of n summands, thus much easier to work with. 


REMARK 1 Let us recall that a function y = g(x) attains a maximum at a 


point x = xo, if Lg(Wlozx = 0 and Lola < 0. 


In terms of a random sample of size n, X;,..., Xn from the BC, 0) distribution 
with observed values 1%, ..., &n, determine the MLE 9 = 6(x) of 6 e (0, 1), 
x= (Mio Da): 


DISCUSSION Since f(x; 0) = 6“(1 — 0% a = 0 or l,i = 1,...,n, 
the likelihood function is 


L@ 1x) =] rea- ay, t= Anite 


so that t = 0,1, ..., n. Hence log L(@ | x) = tlog0 + (n — t)log(1 — 0). From 
the likelihood equation 4 logL(@ | x) = 4 — 4 = 0, we obtain 9 = £ 
Next, 2 log L(8 | x)= -4 — RED which is negative for all 9 and hence for 
0 = t/n. Therefore, the MLE of 6 is 6 = 4 = &. 


Determine the MLE 6 = 6(x) of @ e (0, o0) in the P(0) distribution in terms of 
the random sample X;,..., Xn with observed values X;,..., £n- 


DISCUSSION Here f(x;; 0) = erer = 0, 1, 0... t= m0 that 


i! ? 


Dareia) 
log L(6 | x) = log | | | log{e-”’ [J 


i=1 


= —né + (log 0) 5 xi — log (I ns) 
i=l i=1 
= —nd + (nlog0)x — log (i nt) 


i=1 











Hence 4 7 log LO | x) = -n+ w= = 0, which gives 9 = X, and 2 202 z log L(8 |x) = 


— < 0 for all 9 and hence for 6 = &. Therefore the MLE of 0 is 6 = &. 
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Determine the MLE 6 = Ó(x) of 0 e (0, 00) in the Negative Exponential distri- 

bution f(x; 0) = 6e-*”, x > 0, on the basis of the random sample X;,..., Xn 

with observed values %, ..., Xp. 

DISCUSSION Since f(x; 0) = 6e-°', x; > 0, i=1,...,n, we have 
log L(0 | x) = log(0"e"%%) = nlog 6 — nx6, so that 


2 log L(@ |x)=%-—nx=0, and hence 6 = 1/2. Furthermore, 2 x ~ log L(0|x)= 


— gz < 0 for all 6 and hence for 6 = 1/2. It follows that 6 = 1/z. 


Let X¡,..., Xn be a random sample from the N(u, a?) distribution, where only 
one of the parameters is known. Determine the MLE of the other (unknown) 
parameter. 


DISCUSSION With, ..., &n being the observed values of X1, ..., Xn, we 
have: 


(i) Let u be unknown. Then 
log L x) = lo e Xi f 
gL |x) Mas i: a2 2 wr || 


= —nlog (V27 0) z Xe- wÊ, so that 





log L(u | x)= mA = 0, and hence u = 7. Furthermore, 2 fg > log Lu | x) = 
Cn < 0 for all u and hence for u = x. It follows that the MLE of y is 


in 
na 
(ii) Let o? be unknown. Then 


n 1 1 n 
log L(o? | x) = log ex £i a 
I o | 2 1) 


Prog(2m) — Boga? — LY a: uy, so that 
==> O ; 
9 g 2 g 202 — 2 u » 








77 log L(o? | x) = — T F Zo Dami- py = 0, and hence 0? = i Dii — 
py; set 7, (a — ae = s. Then 


a n nl á 
Xe Hone OBL? D= sp C u) 





o n 2n 
~ 2(02)? TON 


F log al | X)| 222 = XF Oy = — 5a < 0. It follows that the MLE of 
ao isc? = ie ¡(0 — uy. 


In all of the preceding examples, the MLE's were determined through 
differentiation. Below is a case where this method does not apply because, 


2 so that 
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simply, the derivative does not exist. As an introduction to the problem, let 
X ~ U(0, 0) (0 > 0), so that the likelihood function is L(@ | x) = Foa) 
where it is to be recalled that the indicator function I, is defined by I4 (x) = 1 
if x e A, and Ia (x) = 0 if x e A". The picture of L(- | x) is shown in Figure 9.1. 


Figure 9.1 





A L(Olx) 














Let X;,..., Xn be a random sample from the Uniform U(a, B) (œ < £) distri- 
bution, where only one of œ and $ is unknown. Determine the MLE of the 
(unknown) parameter. 


DISCUSSION 


(i) Let a be unknown. Since 


1 
F(x;¡; a) = Baa nai, i= 1,...,n, it follows that 
=g 


D 1 
L(a | x) = Ga [Tiesto = OY 
i=1 


where xa = Min(X1, ..., Zn), Ln) = Max(x1, ..., Un); Or 


1 

L(a | x)= ay Lacey May dc). (2) 
Maximization of L(« | x) with respect to a means two things: maximization of 
Tie, œ) (%1)) and maximization of 1/(6 — a)". The maximum value of the former 
quantity is 1 and occurs as long as œ < xa). The latter quantity gets larger and 
larger as a; gets closer and closer to £. But always a < xa) < £, and a is subject 
to the restriction a < xa). Thus, a gets closest to £, if a = xq). In other words, 
the MLE of « is & = %1). 
(ii) Let B be unknown. Relation (2) then becomes 


LEI X) = Tie, oo (1 Mo, p), 


1 
(E =g)" 
whereas always a < Xm) < $. Then, arguing as in the first case, we have that 
the MLE of £ is Ê = X). 


| EXAMPLE 8 
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In the examples discussed so far, there was a single parameter to be esti- 
mated. In the examples presented below, the parameters to be estimated will 
be two or more. If the maximization is to be done through differentiation, then 
the following remark reminds us how this method is implemented. 


REMARK 2 The function y = g(%, ..., Xg) attains a maximum at a point 


(%01, .--, Vox), if the point (Xo1, ..., Vox) satisfies the system of the k equations 
ZI, ..., £k) = 0, i = 1,..., k, and, in addition, the point (%01, ---, Lok) 
renders the k x k matrix of the second-order derivatives Gain g(%, .--, kJ), 


i,j = 1,...,k, negative definite. What is meant by the term “negative def- 

inite” is that the real-valued quantity below is <0 for all nonzero vectors 

(Ar, .--, Ag) AZ +--+ +A2 Æ 0); namely, 
2 


Qı, Enea An) (eon rreg Xk) 


0X; ON; 


AY 
: <0. 





(Ei, -Ek )= (X01; «<> o) he 


Refer to Example 5 and suppose that both u and o? are unknown. Determine 
their MLE’s. 


DISCUSSION Here 





n n 12 
log L(u, 0° | x) ==—3log(27) — 5 logo” - 55 Dm py, 


and then the two likelihood equations produce the unique solution u = % and 
o? = +" (a; — 1, which we may denote by s?. Next, the 2 x 2 matrix of 


Tn 
2 0 
E i = a à 
the second-order derivatives, evaluated at (2, s*), becomes ( 6 ti ) which 
Tas 


2: 
is negative definite (see Exercise 1.2 below). Thus, ji = Y andó? = 1 Nr (a 
TY are the MLES of y and o?, respectively. 


A Multinomial experiment is carried out independently n times, so that the 
likelihood function is 


n! m y 
L( pi, «+ +) Pr | ¥) = GD ++ Be, 
Mss Ups 


where x; > 0, i = 1,...,7 integers, with xi +---+4%, = n, and0 < pi < 1, 
i=1,...,r with pi +---+ pr = 1. Determine the MLE’s of pi, i = 1,..., 7. 


DISCUSSION The number of independent parameters is r — 1, since, for 





example, p, = 1 — pı —--- — Pr-1- Looking at the log L(p;,..., p, | x) and 
taking partial derivatives with respect to p;, i = 1,...,r — 1 (and remember- 
ing that p, = 1 — pı —--- — Pr-1) we obtain 
1 1 
x—-=*—=0, t=1,...,r—1. 
Pi Pr 


— Yi 


From these relations, the unique solution p; = =*, ¿=1,..., r follows. 
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Next, the (r— 1) x (r — 1) matrix of the second-order derivatives, evaluated 











at pi = xi/n, 1=1,..., 7, is given by 
ww w gn Oe _ E 
xy Lr Lr Lr Lr 
we n n É n 
Ly Xa Ür Ly Br $ 
_w wv sa E ow BE 
Br Ey Lr Tyi Br 


which is seen to be negative definite. Consequently, the MLE’s of p;, i = 
l, ...,r are ĝi = 4, i= 1, ..., r. (Also, see Exercise 1.3 below.) 


Refer to Example 6, assume that both a and £ are unknown, and determine 
their MLE’s. 


DISCUSSION Expression (1) becomes here as follows: 


1 
Lœ, B | x) = asc). (3) 
Since always a < X(1) < Xn < $, the right-hand side of (3) is maximized 
if Tía, o (%(1)) = land Lo, pm) = 1, which happen if a < XD» Tim) < B 
and also if a and £ are as close together as possible. Clearly, this happens for 
a = Xq) and f = Xm. In other words, the MLE’s of a and £ are 4 = xa) and 
p= Xn) 


1.1 Refer to Example 6(ii) and justify the statement made there that xq is, 
indeed, the MLE of £. 


1.2 Show that the matrix ‘ea spa) in Example 7 is, indeed, negative 
definite. 


1.3 In reference to Example 8, show that: 
G) pj = an i = 1, ..., r, is, indeed, the unique solution of the system of 
equations considered there. 
(ii) The (r — 1) x (r — 1) matrix exhibited there is the matrix of the 
second-order derivatives as stated. 


(iii) The matrix in part (ii) is negative definite. 
1.4 In reference to Example 18 below, show that Var,(S?) = a as stated 
there. 


1.5 If Xi, ..., Xn are independent r.v.'s distributed as B(k, 0), 9 € 2 = (0, 1), 
with respective observed values 21, ..., Xn, Show that 6 = = is the MLE 
of 6, where 7 is the sample mean of the x;'s. 
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1.6 If the independent r.v.'s. Xj, ..., Xn have the Geometric p.d.f. f(x; 0) = 
01-07, x=1,2,...,0 € Q = (0, 1), and respective observed values 
X1, ..-, Ln, then show that 6 = 1/1 is the MLE of 6. 


1.7 On the basis of a random sample of size n from the p.d.f. f(x; 0) = 
(0+ Dx*, O<x<l, 0 € Q= (—1, 00), derive the MLE of 9. 


1.8 On the basis of a random sample of size n from the p.d.f. f(x; 0) = 
6x91, O<x<1l, 6 € Q=(0, ov), derive the MLE of 9. 


1.9 (i) Show that the function f(x; 0) = ye", eR, 0 € Q = (0, 00) is 
ap.d.f. (the so-called Double Exponential p.d.f.), and drawits picture. 


(ii) On the basis of a random sample from this p.d.f., derive the MLE 
of 6. 


1.10 (i) Verify that the function f(x; 0) = 6°xe~°”, x > 0, 0 € Q = (0, 00) 
is a p.d.f., by observing that it is the Gamma p.d.f. with parameters 


a=2, B=1/0. 
(ii) On the basis of a random sample of size n from this p.d.f., derive the 
MLE of 0. 


1.11 (i) Showthatthe function f(x; a, p) = aoe, w>a,aen, B>O0, 
is a p.d.f., and draw its picture. 
On the basis of a random sample of size n from this p.d.f., deter- 
mine the MLE of: 
(ii) a when £ is known. 
(iii) 6 when a is known. 
(iv) « and £ when both are unknown. 


1.12 Refer to the Bivariate Normal distribution discussed in Chapter 4, 
Section 5, whose p.d.f. is given by: 


1 
fx. r(x, y) = ———_e Y, xyen, 


210,021 1- p? 


where 


1 x- m\' a—uY/(Y-hH2 y=H2Y"] 
a= — 2p ¿E |, 
l-p 01 01 02 02 
pa, Ma € Ñ, of, o2 > 0 and —1 < p < 1 are the parameters of the distri- 
bution. The objective here is to find the MLE’s of these parameters. This 
is done in two stages, in the present exercise and the exercises following. 
For convenient writing, set 0 = (u1, 12, of, 03, p), and form the like- 


lihood function for a sample of size n, (X;, Yi), i = 1,...,n, from the 
underlying distribution; i.e., 


n 
1 1L 
LO |x, y) = | >= | exp|-5 )_ q), 
2701 09,/1— p? 52 ‘ 
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where 


1 2 — ay Xi — Hı \ (Yi — u2 yi — m Y 
qi = i ; 2p T , 
=p 01 01 02 02 


and x = (%1,...,%n), Y = (Y, ---, Yn), the observed values of the X;'s 
and the Y;'s. Also, set 


10) = M0 | x, y) = log LOO | x, y) 
n 
2 





n 
2 








log of 





n ia 
= —n log(27) log ož 5108 (1 p”) 5 Y a. 
i=l 


G) Show that the first order partial derivatives of q given above are 
provided by the following expressions: 














dq _ 2x — nı)  2p(y— ua) 

ðu ol- aol- p?)’ 

dq _ 2(y— u2) | 2p(X— 1) 

dz of(1— p?) aal- p?y 

dq _ (Œm) pa my- nu) 
do; of (1 — p?) ojo(1—p?2) ” 
dq (UY, AR uD- m) 
doy o3(1 — p?) agd- ” 





a) (42) | 
a A AU a E 
2 [YH y — ba 
a E N E) 


(ii) Use the above obtained expressions and 1(0) in order to show that: 




















aaa) — n E ) np (5 ) 
m eal- od py 
9A(0) n E np 7 
= x a 
mn a 
MO) _ n ia Y Y i — myi — u) 
do? Zo? 20(1 — p?) 207021 — p?) i 
daa) _ n ia Yi wey? pii — Ma) (Yi — u) 
dos 20% 203(1 — p?) 20105(1 — p?) , 
MO) _ np ¡DIMAS A A 
ðo 1- of(1 — p?) o3 (1 — p?) 


de A+ 07) Vy uyi — 12) 
o102(1 — p?) l 
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(iii) Setting “eo = i) = 0, and solving for u; and u2, show that there 


is a unique solution given by: ñ = X and fla = Y. 
(iv) By setting a = —- = ae = 0 and replacing uy and uz by the 
respective expressions x and y, show that we arrive at the equations: 








Sy pPSxy 2J 0”, Sy PSry = 


ar 0102 oe 0102 








1— p’, 


Sy + Sy d+ rg ee =i 2 


2 2 p E 
Oi O) pO]02 





where S.=1 5%, (0-0? S=!1 YN a and Sy= 
1 ei — DN — 9). 

(v) In the equations obtained in part (iv), solve for of, ož, and p in order 
to obtain the unique solution: 


Õ = Se C= 8, Š= Sy SP R. 


1.13 The purpose of this exercise is to show that the values ñ, fa, aL Ge, and 
p are actually the MLE’s of the respective parameters. To this end: 
(i) Take the second-order partial derivatives of 1(0), as indicated below, 
and show that they are given by the following expressions: 

















072(0) n def 921(0) np def d 
= = dii, = = dp, 
amy of(1 — p?) du10u2  ojo(1— p?) 
9%(0) n — u) NP(Y — u2) def 
35 4 2 3 a = ds, 
amdo of- 20o- p) 
9,0) np(Y — u2) def 
3 = 3 3 = dia, 
911 005 20105 (1 — p*) 
0°20) _ 22 M1) n+ PNI- u2) def g. 
dude = of —p?? aal- pP? y 
Gi) In du, i = 1,...,5, replace the parameters involved by their re- 


spective estimates, and denote the resulting expressions by dy;, i = 
1,..., 5. Then show that: 


5 nSy > NS xy 
i= >. =~: (= so a 
SySy E Siy SrSy E Diy 
where Sr, Sy, and Syy are given in Exercise 1.12(iv). 
Gii) Work as in part (i) in order to show that: 
der 3°A(0) 32A (0) der 3?A(0) n 
1 = = = diz, d2 = z =- a5) 
du20u1 ðu Ope 013 o3(1 — p°) 
der OAC) — np(%— m) 
> duado?  2o0vo(1—p?) 


dis = dia = dis = 0, 
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der 0%1(0) n(y — u2) np(t— u) 
24 = = > 
duz003 AA- 2001 p) 
act 0740) _ 20n(9— me) nA +E- u) 
” amde RU- aal- p?P 
(iv) In part (iii), replace the parameters by their respective estimates and 
denote by d»;,i= 1, . . . , 5 the resulting expressions. Then, show that: 
3 > > NS 3 5 R 
da =d, de = -~——,,_ and das = da = də = 0. 
21 12 22 5,8, — S2, 23 24 25 
(v) Work as in parts (i) and (iii), and use analogous notation in order to 
obtain: 
des PAO g g PAO _ 
31 EETA 13, @32 TET 23, 
der FAO) n Pamu 89 (0 — uyi — 12) 
* o 2ot o (1—p?) do? ox(1 — p?) 
der AO) pj — MY — 12) 
34 = 24-2 — 33/1 — p2 ? 
doy 005 40? o5(1 — p*) 
_ def 910) _P Dwi- A+ la — uyi- Ha) 
3 Joao ofl- p} 2030(1 — È ' 


(vi) Work as in parts (ii) and (iv), and use analogous notation in order to 
obtain: d31 = da2 = 0, and 








‘ n(28,Sy — Si) - NS 

d33 = 3 a Y? d34 = a y? 
4S?(S,Sy ~~ S2) ASS y (SySy D S2) 

, nS!’ Sy 

das y Y 
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(vii) Work as in part (v) in order to obtain: 

















der (0) — d _ der 0%A(0) das dea PO) 
41 = dor On 14, gp Boe da 24, 043 Bop do? 34, 
def 9210) 0 Na Yu 3p Vii — uyi — 12) 
4 ay 204 ob — p?) 4o103(1 — p?) 
def 3?A(0) _?P Nim. + 07) Ga — M0 — u) 
> Ba? ap o4(1 — p22 20103 (1 — È ' 


(viii) Work as in part (vi) in order to get: 
da =d =0, dis = day, and 
de MSS) y O SS 
44 = , Q45 = . 
45? (SpSy — S2,) 28;/" (SS — S2,) 
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(ix) Work as in part (v), and use analogous notation in order to get: 


ds: = dı5, Us =d25, ds53 = d35, 54 = das, and 


_ndt+p?) 143° |1L > lg 
ds = Go py Uo of Do mı) +20 ma) 





2p38+p") < 
+ nol = Py Due uyi — u2) |. 
(x) Work as in part (vi) in order to obtain: 
ds = dis, ds2 = dos, dss = dəs, dss = das, and 
` NSt Sy (SxSy + SÈ) 
55 = a 
(SrSy E S2) 


1.14 In this exercise, it is shown that the solution values of ñi = %, ñ2 = Y, 
o, = Sr, Č3 = Sy, and p=5,,/ Ss * Sy. are, indeed, the MLE’s of the respec- 
tive parameters. To this effect, set D for the determinant 








di di2 dis dia dis 
da das das das das 
da1 da2 da3 ds das ) 
day dag das das das 
ds ds? ds3 dsa dss 


& 
II 








and let D; be the determinants taken from D by eliminating the last 
5—íi, i= 1,...,5 rows and columns; also, set Do = 1. Thus, 








n > M dı di2 
Di=di, Da=|, la 
dz, d22 
z y dy, di2 di3 dis 
di di2 di3 A 2 a x 
za $ ÉS se dar dez də3 də 
Ds = (da dea das], Di= |, k 3 Il 
us do 2 da d32 d33 d34 
da d32 d33 


da dia dis da 
(i) Use parts (ii), (iv), (vi), (viii), and (x) in Exercise 1.13 in order to 
conclude that the determinants D(=D;) and D;, i = 1,..., 4 take 
the following forms: 
dí di2 0 0 0 
do, das 0 0 0 
0 0 ds dsx dal, 
0 0 da dí das 
0 0 dis di dss 


& 
II 
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7 > ` du d 
Dı = di, Da = eg ay > 
dz, do 
a du d2 0 0 
du diz 0 x 7 
- i A da dea 0 0 
D; = da de 0, Da= E aa 
7 0 0 dss ds4 
0 0 dg 


0 0 día da 


(ii) Expand the determinants D;, i = 1,..., 4, and also use parts (iv) 
and (viii) of Exercise 1.13 in order to obtain: 


Di=d1, Dz = dida — (d2), Ds = d33De, 
Da = [d3zdas — (ds) | D2. 
(iii) Expand the determinant D;(=D), and also use parts (viii) and (x) 
of Exercise 1.13 in order to get: 
Ds = Da(d33A — d34B + d35C), 
where 
A= dudss — (ds), B = dzad55 — dasd35, C = doadas — dados. 


(iv) For convenience, set: Se = a, Sy = B, Sry = y, and S,Sy — Sy = 
ap — y? = ô, so that a, B > 0 and also 5 > 0 by the Cauchy-Schwarz 
inequality (see Theorem 1(ii) in Chapter 4). Then use parts (ii), 
(vi), and (viii) in Exercise 1.13 in order to express the determinants 
Di, i=1,..., 4 in part (ii) in terms of a, $, y, and 6 and obtain: 





z NB -~ É nap — y?) n?(aB + 8) 
D = — — D = — D = = à 
: he Se ee 40282 4028? 
4 
~ n 
Dize -s 
4 dape? 


(v) Use the definition of A, B, and C in part (iii), as well as the expres- 
sions of d34, d35, d44, d45, and dss given in parts (vi), (viii), and (x) of 
Exercise 1.13, in conjunction with the notation introduced in part 
Gv) of the present exercise, in order to show that: 

Bn? apyin allyn? 


283 > 283 > 461/282 








(vi) Use parts (iii) and (v) here, and parts (ii) and (iv) in Exercise 1.13 in 
order to obtain: 


apn 


481 ` 





E ene R ds e ( apn 
Ds = D2(d33A — d34B + d35C) = 3 ( 153 ) = 
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(vii) From parts (iv) and (vi) and the fact that Dy = 1, conclude that: 
Do > 0, D, <0, Dz > 0, D; < 0, D, > 0, and Ds < 0. 


Then use a calculus result about the maximum of a function in more 
than one variable (see, e.g., Theorem 7.9, pages 151-152, in the book 
Mathematical Analysis, Addison-Wesley (1957), by T. M. Apostol) 
in order to conclude that /11, 12, 9, 73, and are, indeed, the MLE's 
of the respective parameters; i.e., 

M=%, fl2=Y, ô? = Sr, da = Sy p= Soy EP SS, 


where 


1 n E 1 n p 
S=-=> @-Ð, =- (9-9, 
n nE 





1 n 
Sry = a X @ XY y). 
i=l 
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THEOREM 1 


Refer to Example 3 and suppose that we are interested in estimating the prob- 
ability that 0 events occur; that is, P;(X = 0) = an = e™®° , call it gı (8). Thus, 
the estimated quantity is a function of 0 rather than 0 itself. Next, refer to 
Example 4 and recall that, if X ~ f(x; 0) = 6e-*", x > 0, then EyX = 1/0. 
Thus, in this case it would be, perhaps, more reasonable to estimate 1/0 and 
call it g2(0), rather than 6. Finally, refer to Examples 5(ii) and 7, and consider 
the problem of estimating the s.d. o = +/o? rather than the variance o?. This 
is quite meaningful, since, as we know, the s.d. is used as the yardstick for 
measuring distances from the mean. In this last case, set 93(0) = +W0?. 

The functions g1, gz, and g3 have the common characteristic that they 
are one-to-one functions of the parameter involved. The estimation problems 
described above are then formulated in a unified way as follows. 








Let 6 = 6(x) be the MLE of 6 on the basis of the observed values 4, ..., Ln 
of the random sample Xj, ..., Xn fromthe p.d-f. f(-; 0), 0 e Q C R. Also, 
let 0* = g(@) be a one-to-one function defined on Q onto Q* C R. Then 
the MLE of 6*, 9*(x0), is given by 6*(x) = g[6(x)]. 





PROOF The equation 6* = g(@) can be solved for 8, on the basis of the as- 
sumption made, and let 9 = g~!(6*). Then 


LE | x) = L[g7'(6*) | x] = L*(6* | x), say. 


Thus, 
max{L(0 |x); 0 € Q} = max{L*(6* | x); 0* € Q*}. (4) 
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Since the left-hand side in (4) is maximized for Ó = 6(x), clearly, the right-hand 
side is maximized for 6* = g(0). A 


REMARK 3 On the basis of Theorem 1, then, we have: The MLE of exp(-0) 
is exp(—2); the MLE of 1/0 is 1/2; and the MLE of o is [4 X; (a — 94]? for 
Example 5(ii), and [4 Z; (a; — 234]? for Example 7. 

However, in as simple a case as that of the B(1, 9) distribution, the function 
g in Theorem 1 may not be one-to-one, and yet we can construct the MLE of 
0* = g(@). This is the content of the next theorem. 





Let 6 = 6(x) be the MLE of 8 on the basis of the observed values x, ..., Ln 
of the random sample X;,..., Xn from the p.d.f. $(;0),0 € Q C HR. Also, 
let 0* = g(@) be an arbitrary function defined on Q into Q* C R, where, 
without loss of generality, we may assume that Q* is the range of g, so 
that the function g is defined on Q onto Q*. Then the MLE of 6*, ĝ* (x), is 
still given by 6*(x) = g[O(x)]. 











PROOF For each 6* e Q*, there may be several 6 in Q mapped to the same 
6* under g. Let Qe» be the set of all such 6’s; i.e., 


Qg = (0 € Q; g) = *}. 
On Q*, define the real-valued function L* by: 
L*(0*) = sup{L(@); 6 € Q+}. 


The function L* may be called the likelihood function induced by g. Now, 
since g is a function, it follows that g(@) = 6* for a unique ĝ* in Q*, and 
L*(6*) = L(@) from the definition of L*. Finally, for every 0* € Q*, 


L*(6*) = sup{L(9); 0 € Q} < max{L(6); 0 € Q} = L(6) = L*(6*). 
This last inequality justifies calling 6* = g(6) the MLE of0*. A 


(Theorem 2 was adapted from a result established by Peter W. Zehna in the 
Annals of Mathematical Statistics, Vol. 37 (1966), page 744.) 


Refer to Example 2 and determine the MLE 9(6) of the function g(@) = 6(1—@). 


DISCUSSION Here the function g: (0, 1) > (0, 1) is not one-to-one. How- 
ever, by Theorem 2, the MLE 9(@) = X(1 — 7), since 6 = Z. 


REMARK 4 Theorem lis, of course, a special case of Theorem 2. A suitable 
version of Theorem 2 holds for multidimensional parameters. This property 
of the MLE is referred to as the invariance property of the MLE for obvious 
reasons. 

Reviewing the examples in the previous section, we see that the data 
£, ..., Xn are entering into the MLE’s in a compactified form, more, precisely, 
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as a real-valued quantity. From this point on, this is all we have at our dis- 
posal; or, perhaps, that is all that was revealed to us by those who collected 
the data. In contemplating this situation, one cannot help but wonder what we 
are missing by knowing, for example, only X rather than the complete array 
of data x1, ..., £n. The almost shocking fact of the matter is that, in general, 
we are missing absolutely nothing, in terms of information carried by the data 
X,..., Ln, provided the data are condensed in the right way. This is precisely 
the concept of sufficiency to be introduced below. For a motivation of the 
definition, consider the following example. 


In Example 1, each x; i = 1,..., 10 takes on the value either 0 or 1, and 
we are given that = 0.6. There are 2!° = 1,024 arrangements of 10 0’s or 
l’s with respective probabilities given by 0*(1 — 6)!~“ for each one of the 
1,024 arrangements of 0’s and 1's. These probabilities, of course, depend on 8. 
Now, restrict attention to the ( a) = 210 arrangements of 0’s and 1's only, which 
produce asum of 6 or an average of 0.6; their probability is 2100%(1—0)*. Finally, 
calculate the conditional probability of each one of these arrangements, given 
that the average is 0.6 or that the sum is 6. In other words, calculate 


10 
P(X, = Hj, 1 =1,...,10| T = 6), ray X; (5) 
i=l 
Suppose that all these conditional probabilities have the same value, which, 
in addition, is independent of 6. This would imply two things: First, given 
that the sum is 6, all possible arrangements, summing up to 6, have the same 
probability independent of the location of occurrences of 1's; and second, this 
probability has the same numerical value for all values of 6 in (0, 1). So, from 
a probabilistic viewpoint, given the information that the sum is 6, it does not 
really matter either what the arrangement is or what the value of 0 is; we can 
reconstruct each one of all those arrangements giving sum 6 by choosing each 
one of the 210 possible arrangements, with probability 1/210 each. It is in this 
sense that, restricting ourselves to the sum and ignoring or not knowing the 
individual values, we deprive ourselves of no information about 6. 


We proceed now with the calculation of the probabilities in (5). Although we 

can refer to existing results, let us derive the probabilities here. 

P(X; = x, 1=1,...,10| T=6)= P(X, =%;, t= 1, ..., 10, T=6)/P,(T =6) 
= P(X, = ty ¿=1,..., 10)/Po(T = 6) 








(since X; = x;, i = 1,..., 10 implies T = 6) 


= 6 (1-0)' ¡joa —6)* (since T ~ B(10, 6)) 


= (4) = 1/210 (= 0.005). 


Thus, what was supposed above is, actually, true. 
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This example and the elaboration associated with it lead to the following def- 
inition of sufficiency. 


DEFINITION 1 

Let X1,..., Xn be a random sample with p.d.f. f(-; 0), 0 € Q C Ri, and 
let T = T(X,,..., Xn) be a statistic (i.e., a known function of the X;’s). 
Then, if the conditional distribution of the X;'s, given T = t, does not 
depend on 0, we say that T is a sufficient statistic for 6. 


REMARK 5 If g is a real-valued one-to-one function defined on the range 
of T, it is clear that knowing T is equivalent to knowing T* = g(T), and vice 
versa. Thus, if T is a sufficient statistic for 0, so is T*. In particular, if T = 
D X or (Xy (or 19, — X)”) is a sufficient statistic for 6 
so is X or ¡(Xi — Y (or Y (A, XY). 


REMARK 6 The definition given for one parameter also applies for more 
than one parameter, but then we also need a multidimensional sufficient statis- 
tic, usually, with dimensionality equal to the number of the parameters. In all 
cases, we often use simply the term “sufficient” instead of “sufficient statis- 
tic(s) for 6,” if no confusion is possible. 

As is often the case, definitions do not lend themselves easily to identifying 
the quantity defined. This is also the case in Definition 1. A sufficient statistic 
is, actually, found by way of the theorem stated below. 





(Fisher-Neyman Factorization Theorem) Let X;,..., Xn be a ran- 
dom sample with p.d.f. fC; 0), 0 € Q C R, and let T = T(X,,..., Xn) 
be a statistic. Then T is a sufficient statistic for 0, if and only if the joint 
p.d.f. of the X;’s may be written as follows: 


WG dh (CB s ooy Sih O) = CL g e o9 O Aly 0 0.0.5 Bip): (6) 











The way this theorem applies is the following: One writes out the joint p.d.f. 
of the X;’s and then one tries to rewrite it as the product of two factors, one 
factor, g[T(%, ..., Xy); 0], which contains the x;'s only through the function 
T(xX1, ..., Xp) and the parameter 6, and another factor, h(%, ..., Xy), which 
involves the x;'s in whatever form but not 6 in any form. 


REMARK 7 Thetheorem just stated also holds for multidimensional param- 
eters 0, but then the statistic T is also multidimensional, usually of the same 
dimension as that of 0. A rigorous proof of the theorem can be given, at least 
for the case of discrete X;’s, but we choose to omit it. 
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In all of the Examples 2-10, the MLE’s are, actually, sufficient statistics or 
functions thereof, as demonstrated below. This fact should certainly reinforce 
our appreciation for these MLE’s. 


APPLICATION In Example 2, the p.d.f. is written as follows in a compact 
form f(x; 0) = "(1 — 0) 2% Lo, (%;), so that the joint p.d.f. becomes: 


n n 
L(0|x)=0'(1-0'"x]| tot), +=» %. 
i=1 i=l 
Then g[T (a, ..., £n); 0] = O°. — 0)", and h(a, ..., 
follows that T = )>;_, X; is sufficient and so is £ = X. 
Examples 3 and 4 are treated similarly. 
In Example 5(i), 


NUQX— u) 1 E 1L 
L(0 | x) = exp AA x ( > exp 5,8 B ; 
= 


so that X is sufficient for u. Likewise, in Example 5(ii), 


L(o? |x) = (=) exp E 2 (0 = e x1, 


so that >; (X; — Y is sufficient for o? and so is 4 7 (X; — w. 
Example 6 is treated similarly. 
In Example 7, 


En) = [hi 40,1 ax). It 














i y” y ee 1 
Elija? 9 =( ==) ap] T (x; — ay zga" e x 1 
i=1 


because Jp mi- uF = Y DP = Vi — 2)? +e py. 
It follows that the pair of statistics (X, X- (X; — X )”) is sufficient for the pair 
of parameters (u, 0”). 

Examples 8, 9, and 10 are treated similarly. 


REMARK 8 Under certain regularity conditions, it is always the case that a 
MLE is only a function of a sufficient statistic. 

Here is another example, in four parts, where a sufficient statistic is deter- 
mined by way of Theorem 3. 


On the basis of a random sample of size n, Xj, ..., Xn, from each one of the 
p.d.f.'s given below with observed values 2, ..., &n, determine a sufficient 
statistic for 0. 


O f@O=ar, ul 0E€2=0(0, 00). 

Gi) f@;0) = le", x>0, 0€2=(, 00). 
ii) f(v;0)=(1+0)0®, 0O<x<l 6€2=(-1,00). 
(iv) f@;0)=4, x20, 0E€2=(0, 00). 


x 
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DISCUSSION In the first place, the functions given above are, indeed, 
p.d.f.’s (see Exercise 2.12). Next, rewriting each one of the p.d.f.’s by using the 
indicator function, we have: 


O fæ; 0) = HI 000), so that: 


n n 
pai X lo (o), 


n 0 
x Inoi) = 33 
0+1 I] og, t T xi) 


o sce 
(T vi) 


and therefore [ [;_, X; is sufficient for 0. 
Gi) f(x; 0) = Ze" Lg (a), so that: 


] [ste 0 = 
i=1 


n 


1 n n 5 n 
[ [ru 0 = 1 gi ajo Diti ] [o.otz) 
1 i=l i=l 


i= 
l iy g E 
= gn? 29 i=l "i X | EZ Lo, oo (Em), 


i=1 


and therefore > ;_, X is sufficient for 6. 
Gii) fw;A)=A+ 92% Io, næ, so that 


n n 0 n 
[Ir 0 = a+ 0)" (I nı) [Tono 
i=1 i=l 


i=1 


6 
n 
= (1+ gy" (I n) x Io, næm Ho, NEw), 
i=1 
and therefore [ [;_, X; is sufficient for 0. 
(iv) f@;0)= £ L19,00(0), so that: 


n 


n 9 n ‘i 1 
I] F(x;;0)= To) I] Iio, o) (£i) = 0" To, co (Lay) x rn)’ 


i=1 i 
and therefore X (1) is sufficient for 0. 


This section is concluded with two desirable asymptotic properties of a 
MLE. The firstis consistency (in the probability sense), and the otheris asymp- 
totic normality. However, we will not bother eitherto listthe conditions needed 
or to justify the results stated. 





THEOREM 4 
Let On = Ón(X1, ..., Xn) be the MLE of 0 € Q C R based on the ran- 
dom sample X;,..., Xn with p.d.f. fC, 0). Then, under certain regularity 
conditions, {6,,} is consistent in the probability sense; that is, ô„ > 0 in 
Po-probability as n > oo. 











THEOREM 5 


Exercises 259 


The usefulness of this result is, of course, that, for sufficiently large n, Ôn is 
as close to (the unknown) 6 as we please with probability as close to 1 as we 
desire. The tool usually employed in establishing Theorem 4 is either the Weak 
Law of Large Numbers (WLLN) or the Tchebichev inequality. In exercises at 
the end of this section, the validity of Theorem 4 is illustrated in some of the 
examples of the previous section. 





Inthe notation of Theorem 4, and under suitable regularity conditions, the 
MLE @, is asymptotically normal. More precisely, under P¿-probability, 


TOO N(0, 07), asn— oo, 


2 
where oF = 1/1(0) and 1(0)= Ej [ossa o] 5 ae Hea). CH 








To state it loosely, 6, ~ N(0, o7/n) for sufficiently large n. That is, the 
MLE 6, is approximately Normally distributed around @, and therefore various 
probabilities related to it may be approximately calculated in principle. The 
justification of the theorem is done by using a Taylor expansion of the derivative 
S log L(@ | X) up to terms of third-order, employing the fact that 2 log L(0 | 
X) lo-0, = 0, and suitably utilizing the WLLN and the Central Limit Theorem 
(CLT). For some applications of this theorem, see, e.g., Example 21, page 324, 
ofthe book A Course in Mathematical Statistics, 2nd edition, Academic Press 
(1977), by G. G. Roussas. 


REMARK 9 The quantity 1(0) is referred to as the Fisher information 
carried by the random sample X), ..., Xn about the parameter 0. A justification 
for the “information” stems by the fact that o; = 1/1(0), so that the larger 1(0) 
is the smaller the variance o? is, and therefore the more concentrated @ is 
about 9. The opposite happens for small values of I(@). 


2.1 Let X;,..., Xn beii.d. r.v.'s with the Negative Exponential p.d.f. f(x; 6) = 
de, x > 0, 0 € Q=(0, 00). Then: 
(i) Show that 1/X is the MLE of 6. 
(ii) Use Theorem 1 in order to conclude that the MLE of 6* in the 
parameterized form f(a; 0%) = Le%, x> 0, is X. 


2.2 Let X be ar.v. denoting the life span of an equipment. Then the reliability 
of the equipment at time x, R(x), is defined as the probability that X > x; 
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i.e., R(x) = P(X > x). Now, suppose that X has the Negative Exponential 

p.d.f. f(x; 0) = je", x > 0, 9 € Q = (0, œ). Then: 

(i) Calculate the reliability R(x; 0) based on this rv. X. 

(ii) Use Theorem 1 in order to determine the MLE of R(x; 0), on the basis 
of a random sample Xj, ..., Xn from the underlying p.d.f. 


2.3 Let X be ar.v. describing the lifetime of a certain equipment, and suppose 
that the p.d.f. of X is f(x; 0) = 6e-*”, x>0, 0 € Q=(0, ov). 
(i) Show that the probability that X is greater than or equal to ¢ time 
units is g(9) = e. 
(ii) We know (see Exercise 2.1) that the MLE of 6, based on a random 
sample of size n from the above p.d.f., is 6 = 1/X. Then determine 
the MLE of g(@). 


2.4 Consider the independent r.v.’s X1,. . . , Xn withthe Weibull p.d.f. f(a; 6) = 
20 exp(—a” /0), x > 0, 0 € Q= (0, 00), y > 0 known, and: 
(i) Show that ô = O ;_, X7)/nis the MLE of 6. 
(ii) Take y = 1 and relate the result in part (i) to the result in Exercise 
2.1(ii). 


2.5 Let X1,..., Xn be a random sample of size n from the N(u, 0?) distribu- 
tion, where both u and o? are unknown. Set 0 = (u, 0?) and let p be a 
(known) number with 0 < p < 1. Then: 

(i) Show that the point c for which P(X < c) = pis given by: c = 
+ op). 
(ii) Given that the MLE’s of u and o? are, respectively, 4 = X and 6? 
L = 1% (X — X ?, determine the MLE of c, call it ĉ. 
(iii) Express € in terms of the X;’s, if n = 25 and p = 0.95. 
2.6 (i) Show that the function f(x; 0) =0x"4+D, x> 1, 0 € Q= (0, 00) is 
ap.d.f. 
(ii) On the basis of a random sample of size n from this p.d.f., show 
that the statistic X; - - - Xn is sufficient for 0, and so is the statistic 
Ji log Xi. 


2.7 Let X be a r.v. having the Geometric p.d.f. f(x; 0) =0(1- 6)""1, x = 
1, 2,...,0 € Q = (0, 1). Then show that X is sufficient for 8. 


2.8 In reference to Exercise 1.9, show that >;_, |X;| is a sufficient statistic 
for 6. 


2.9 (i) In reference to Example 3, use Theorem 3 in order to find a sufficient 
statistic for 0. 
(11) Do the same in reference to Example 4. 
(iii) Do the same in reference to Example 6(), (ii). 


2.10 Same as in Exercise 2.9 in reference to Examples 8 and 9. 


2.11 Refer to Exercise 1.11, and determine: 
(i) A sufficient statistic for a when £ is known. 
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(ii) Also, a sufficient statistic for $ when a is known. 
Gii) A set of sufficient statistics for a and 6 when they are both unknown. 


2.12 Show that the functions (i)—(iv) given in Example 12 are, indeed, p.d.f.'s. 
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Perhaps the second most popular method of estimating a parameter is that 
based on the concepts of unbiasedness and variance. This method will be 
discussed here to a certain extent and will also be illustrated by specific 


examples. 
To start with, let X1, . . ., Xn be arandom sample with p.d.f. f(-; 0), 0 QC 
KR, and let us introduce the notation U = U (X1, ..., Xn) for an estimate of 6. 
DEFINITION 2 


The estimate U is said to be unbiased if E,U = 0 for all O e Q. 


Some examples of unbiased estimates follow. 


L EXAMPLE 13 | Let Xj, ..., Xn be having any one of the following distributions: 


(i) BA, 6), 8 e (0, 1). Then the sample mean X is an unbiased estimate of 0. 
(ii) P(@), 9 > 0. Then again X is an unbiased estimate of 6. Here, since E,X, = 
Vars(X1) = 6, X is also an unbiased estimate of the variance, 
(iii) N(0, 07), 9 ER, o known. Then, once again, X is an unbiased estimate 
of 6. 


(iv) N(u, 0), u known, 6 > 0. Then the sample variance 4 Y; (X; — Y is an 
Xi—u 
Jo 


Es È (E =n or E, E Na = o =9. 


i=1 i=1 


unbiased estimate of 9. This is so because Y, (Š=) ~ x2, so that 








(v) Gamma with a = 6 and £ = 1. Then X is an unbiased estimate of 6. This 
is so because, in the Gamma distribution, the expectation is af, so that 
fora = 0 and £ = 1, EX, =0 and hence Ep X = 0. 

(vi) Gamma with a = 1 and £ = 6, 9 > 0 (which gives the reparameterized 
Negative Exponential distribution). Then X is an unbiased estimate of 6 


as explained in part (v). 


[__ EXAMPLE 14 | Let X1,..., Xn be a random sample from the U(0, 0) (9 > 0). Determine an 


unbiased estimate of 0. 


DISCUSSION Let Y, = max(Xj, ..., Xn); i.e., the largest order statistic of 
the X;'s. Then, by (29) in Chapter 6, the p.d.f. of Y, is given by: 


gM = MFI! fy), for0<y<0 (and 0 otherwise), 


262 


Figure 9.2 


(a) p.d.f. of U; (for a 


Fixed 0); (b) p.d.f. of U2 
(for a Fixed 0) 
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where: 
1 0. y<0 
FM = 5 0<y=<0, Mid=34 5% 0<y<9 
l, y>0 


Then, for0<y<0, g(y= n(m (4) = Ly", so that 
0 


0 n n 0 i ” 
EY, = . — yl dy = — ndy= ai] — 
dl J, O EN EOS h nt 


It follows that Ej (= Y„) = 6, so that *£ 1Y, is an unbiased estimate of 0. 

The desirability of unbiasedness of an estimate stems from the interpreta- 
tion of the expectation as an average value. Typically, one may construct many 
unbiased estimates for the same parameter 0. This fact then raises the question 
of selecting one such estimate from the class of all unbiased estimates. Here is 
where the concept of the variance enters the picture. From two unbiased es- 
timates U, = U¡(X;,..., Xn) and Uz = U2(X1, ..., Xn) of 0, one would select 
the one with the smaller variance. This estimate will be more concentrated 
around 6 than the other. Pictorially, this is illustrated by Figure 9.2. 


0. 























The next natural step is to look for an unbiased estimate which has the 
smallest variance in the class of all unbiased estimates, and this should happen 
for all 0 € Q. Thus, we are led to the following concept. 


DEFINITION 3 

The unbiased estimate U = U(X;,..., Xn) of 0 is said to be Uniformly 
Minimum Variance Unbiased (UMVU), if for any other unbiased esti- 
mate V = V(X;,..., Xn), it holds that: 


Varg(U) < Vare(V) forall0 e £. 


That a UMVU estimate is desirable is more or less indisputable (see, how- 
ever, Exercise 3.18). The practical question which then arises is how one goes 
about finding such an estimate. The process of seeking a UMVU estimate is 
facilitated by the Cramér—Rao inequality stated next. First, this inequality is 
stated, and then we describe how it is used. 


THEOREM 6 
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(Cramér-Rao inequality) Let X;,..., Xn be a random sample with 
p.d.f. f(; 6), 9 € Q C R, and suppose certain regularity conditions are 
met. Then, for any unbiased estimate U = U(X], ..., Xn) of 6, it holds 
that: 


VargU) > 1/nI(0), forall@ eQ, (8) 


where 


2 
10) = En | Soe 09] AO (9) 








REMARK 10 The unspecified conditions mentioned in the formulation of 
the theorem include the assumption that the domain of x in the p.d.f. f(x; 0) 
does not depend on 0; thus, the U(0, 9) distribution, for example, is left out. 
Also, the conditions include the validity of interchanging the operations of 
differentiation and integration in certain expressions. The proof of Theorem 6 
is relatively long and involves an extensive list of regularity conditions. It may 
be found in considerable detail in Subsection 12.4.1 of Chapter 12 of the book 
A Course in Mathematical Statistics, 2nd edition, Academic Press (1997), 
by G. G. Roussas. Actually, in the reference just cited what is proved is a 
generalized version of Theorem 6, where the estimated function is a real-valued 
function of 0, g(@), rather than 0 itself. 


REMARK 11 It can by shown that, under suitable conditions, the quantity 
I(@) in (9) may also be calculated as follows: 
y? 
I(0) = —E; E 


392 198 S; D| (10) 


This expression is often easier to calculate. 
The Cramér—Rao inequality is used in the following way. 


(i) Calculate the Fisher information either through (9) or by way of (10). 
(ii) Form the Cramér—Rao (C-R) lower bound figuring in inequality (8). 
(iii) Try to identify an unbiased estimate whose variance is equal to the C-R 
lower bound (for all 6 € ©). If such an estimate is found, 
(iv) Declare the estimate described in (iii) as the UMVU estimate of 8. 


In connection with steps (1)-(iv), it should be noted that it is possible that 
a UMVU estimate exists and yet such an estimate is not located through this 
process. The reason for such a failure is that the C-R inequality provides, 
simply, a lower bound for the variances of unbiased estimates, which may be 
strictly smaller than the variance of a UMVU estimate. It is, nevertheless, a 
good try! 

The use of the inequality will be illustrated by two examples; other cases 
are left as exercises. 
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[EXAMPLE 15 | Refer to Example 2 and seek a UMVU estimate of 9 through the C-R inequality. 
DISCUSSION Here f(x; 0) = 0®(1 — 0), x= 0, 1, so that: 
G) log f(X; 0) = Xlog0 + (1 — X)log(1 — 0), 





ena OE: e ia E 

a A O oe ae 
a” 0 1-0 1 

1(0)=—Ep| — log f(X; 0) |= = ince E¿X=0, 

(0) alam og f( ) ) 92 F a—-oy 01-0y since Lg 


(ii) The C-R lower bound = O =0(1—0)/n. 

(iii) Consider U = X. Then E,X = 0, so that X is unbiased, and next coż (X) = 
6(1 — 0)/n= 1/n1(0), since ox) = 0(1 - 0). 

(iv) The estimate X is UMVU. 


L EXAMPLE 16 | Refer to Example 4 and use the following parameterization: 
X ~ f(x; 0) = lei, x> 0, sothat EX =0, o(X)=8?. 
Then seek a UMVU estimate for 0 through the C-R inequality. 
DISCUSSION Here 
(i) log f(X; 0) = —log0 — 4, 


a 1 X 9? 1 2X 
—l Xi0)=-=3+ 5 d — l ME = — = 
30 og f(X; 0) gg and z og f(X; 0) RGB? 
a? 1 20 1 
I(0) = -Eo | — 1 PEDE 24 ta, 


(ii) The C-R lower bound = HO = 0? /n. 

(iii) Consider U = X. Then EU = 0, so that X is unbiased, and o2(X) = £ = 
1/n1(0). 

(iv) The estimate X is UMVU. 


There is an alternative way of looking for UMVU estimates, in particular, 
when the approach by means of the Cramér—Rao inequality fails to produce 
such an estimate. This approach hinges heavily on the concept of sufficiency al- 
ready introduced and also an additional technical concept, so-called complete- 
ness. The concept of completeness is a technical concept, and it says, in effect, 
that the only unbiased estimate of 0 is essentially the 0 statistics. More techni- 
cally, if T is a r.v. with p.d.f. fr(; 6), 9 € Q C R, the family { fr; 0), 6 € Q}(or 
the r.v. T) is said to be complete if, for h : R > R, Egh(T) = 0 for all 6 € Q 
implies k(t) is, essentially, equal to 0. For the precise definition and a num- 
ber of illustrative examples, the reader is referred to Section 11.2 of Chapter 
11 in the reference cited in Remark 10 here. The concepts of sufficiency and 
completeness combined lead constructively to a UMVU estimate by means 
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of two theorems, the Rao—Blackwell and the Lehmann-Scheffé theorem. The 
procedure is summarized in the following result. 





(Rao-Blackwell, Lehmann-Scheffé) Let X;,..., Xn be a random 
sample with p.d.f. f(;0), 0 € Q € R, and let T = T(Xı, ..., Xn) be 
a sufficient statistic for 0 (which is complete). Let U = U(X), ..., Xn) be 
any unbiased estimate of 0, and define the statistic (T) by the relation: 


p(T) = Eo(U | T). ap 


Then (T) is unbiased, Var¿[p(T)] < Varọ(U) for all 0 e 2, and, indeed, 
(T) is a UMVU estimate of 0. If U is already a function of T only, then 
the conditioning in (11) is superfluous. 








PROOF (ROUGH OUTLINE) That y(T) is independent of 9 (and hence a statis- 
tic), despite the fact that we use quantities depending on @ in forming the con- 
ditional expectation, is due to the sufficiency of T. Recall that sufficiency of 
T means that the conditional distribution of U, given T, is independent of 0, 
and hence so is the expectation of U formed by using this conditional distribu- 
tion. That (T) is unbiased is due to a property of the conditional expectation 
(namely, for two r.v.'s X and Y: E[E(X | Y)] = EX), and the inequality in- 
volving the variances is also due to a property of the variance for conditional 
expectations (namely, Var[E(X | Y)] < Var(X)). The concept of complete- 
ness guarantees, through the Lehmann-Scheffé theorem, that no matter which 
unbiased estimate U we start out with, we end up (essentially) with the same 
UMVU estimate (T) through the procedure (11), which is known as Rao- 
Blackwellization. A 


REMARK 12 This theorem also applies suitably to multidimensional pa- 
rameters 0, although, it must be stated here that a version of the Cramér-Rao 
inequality also exists for such parameters. 

The following examples illustrate how one goes about applying Theorem 7 
in concrete cases. 


Determine the UMVU estimate of 9 on the basis of the random sample Xj, ... 
Xn from the distribution P(0). 


, 


DISCUSSION Perhaps, the simplest unbiased estimate of 0 is X4, and we 
already know that T = X¡ +--- + Xn is sufficient for 6. For the Rao- 
Blackwellization of X1, we need the conditional distribution of X4, given T = t. 
It is known, however (see Exercise 2.10(11), in Chapter 5), that this condi- 
tional distribution is B(t, +); that is, P(X; = «| T = t) = (IEA Der, 
It follows that E,(X, | T = t) = L, so that p(T) = Eo(X | T) = T = X, 
It so happens that the conditions of Theorem 7 hold (see Exercise 3.17) and 
therefore X is the UMVU estimate of 6. 
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A well-known case where Theorem 7 works whereas Theorem 6 does not 
(or more properly, their suitable versions for two parameters do or do not) is 
illustrated by the following example. 


Let X;,..., Xn be a random sample from the N(u, 0?) distribution, where both 
p and o are unknown. 

By an application of either theorem, it is seen that X is the UMVU estimate 
of u. Working with the Cramér-Rao inequality regarding o? as the estimated 
parameter, it is seen that the C-R lower bound is equal to 20*/n. Next, 
Theorem 7 leads to the UMVU estimate of 0?, S? = 2% (X: — X Y. Fur 


20* 
n-1? 


larger than = This is the reason the Cramér—Rao inequality approach fails. 
For a little more extensive discussion on this example, see, e.g., Example 9, 
pages 299-301, in the reference cited in Remark 10. 


thermore, it has been seen (Exercise 1.4) that Var,2(S?) = whichis strictly 


3.1 If X is a r.v. distributed as B(n, 0), 0 € Q = (0, 1), show that there is no 
unbiased estimate of 1/0. 


Hint: If h(X ) were such an estimate, then Egh(X ) = 4 for all 0 (€ 
(0, 1)). Write out the expectation, set — = t, and by expanding the 
right-hand side, conclude that e n (= 1) = 0, which, of course, is a 
contradiction. 





3.2 Let X¡,..., Xn be independent r.v.’s with p.d.f. f(x; 6) = 0e™®®, x > 
0, 0 € Q = (0, co), and let Y, be the smallest order statistic of the X;’s. 
Then, by Example 11 in Chapter 6, the p.d.f. of Y, is g¡(y) = noe, 
y > 0. 

(i) Show that both X and nY; are unbiased estimates of 1/0. 
(ii) On the basis of variance considerations, which of these two estimates 
would you prefer? 


3.3 Let X1, ..., Xn be a random sample of size n from the U (0, @) distribution, 
0 € Q = (0, ov), and let Y,, be the largest order statistic of the X;’s. Then: 
(i) Employ formula (29) in Chapter 6 in order to obtain the p.d.f. of Yn. 
(ii) Use part (i) in order to construct an unbiased estimate of 0 depending 
only on Yn. 

(iii) By Example 6 here (with a=0 and B=0) in conjunction with 
Theorem 3, show that the unbiased estimate in part (ii) depends 

only on a sufficient statistic for 6. 


3.4 Let X1, ..., Xn be a random sample of size n from the U(0,, 62) distri- 
bution, 61 < 62, and let Y, and Y, be the smallest and the largest order 
statistics of the X;’s. 

(i) Use formulas (28) and (29) in Chapter 6 to obtain the p.d.f.'s of Y; and 
Yn, and then, by calculating the FE, Y, and E,Y,, construct unbiased 
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estimates of the mean (601 + 62)/2 and of the range 0, — 0, depending 
only on Y; and Y,. 

(ii) Employ Example 6 here (with o = 6; and £ = 62) in conjunction with 
Theorem 3 and Remark 7 in order to show that the unbiased estimates 
in part (i) depend only on a set of sufficient statistics for (61, 62). 


3.5 Let X1, ..., Xn be a random sample of size n from the U(0, 20) distri- 
bution, 6 € Q = (0, 00), and set: 
n+1 n+1 


——_ Y, 2Y, + Y 
U = n+1 n and Uz = Bri na: n+ 1), 


where Y; and Y, are the smallest and the largest order statistics, respec- 
tively, of the X;’s. 
(i) Use relations (28) and (29) in Chapter 6 in order to obtain the p.d.f.’s 
gi and gy of Y; and Y,,, respectively. 
(ii) By using part (i), show that: 
n+2 2n+ 1 
EY, = —— 0 EY, = 0 
641 n+ 1 > Otn n+ 1 
(iii) By means of part (ii), conclude that both U; and U2 are unbiased 
estimates of 0. 





3.6 Refer to Exercise 3.5, and show that: 








. +5n+8 92 = 2 
G) E,Yi = E dara?» Varo (Yi) = ay ara? f 
32 2 2Qn?+4n+1) 2 = 2 

(ii) iY? = = Gena?» Varon) = 57 ara? : 


3.7 Refer to Exercise 3.5, and: 
(i) Use Exercise 5.3 (ii) in Chapter 6 in order to show that the joint p.d.f. 
Jin Of Y, and Y; is given by: 


n(n— 1 
gmn(Y1, Yn) = en =n), O< 41 < Yn < 26. 
Gi) Employ part (i) here and also part (ii) of Exercise 3.5 in order to show 
that: 
2n? +Tn+5 o? 
KY) = 62, Coup (Yi, Y) = 


(n+ 1)(n+ 2) (n+ m+) 
3.8 Refer to Exercise 3.5, and: 


(i) Use Exercises 3.6 and 3.7 (ii) in order to show that: 


n 2 1 2 

ema "DS Gan 

(ii) From part (i), conclude that Var,(U2) < Varg(U,) for all 6 (with 
equality holding only for n = 1), so that the (unbiased) estimate U» is 
uniformly better (in terms of variance) than the (unbiased) estimate 
Uj. 


Varg(U) = 


3.9 Let Xj, ..., Xm and Y,,..., Yn be independent random samples with the 
same mean 0 and known variances oe and Os, respectively. For any fixed 
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c with 0 < c < 1, set U; = că + (1 —c)Y, where X and Y are the sample 

means of the X;’s and of the Y;'s, respectively. Then: 

(i) Show that Us is an unbiased estimate of 9 for every c as specified 
above. 

(ii) Calculate the variance of U,, and show that it is minimized for c = 
co = moż /(no? + mo). 


3.10 Let X,,..., Xn be i.i.d. rv.’s with mean yu and variance o”, both unknown. 
Then for any known constants c1, ..., Cn, consider the linear estimate of 
u defined by: Us => 7_, GiXi. 
(i) Identify the condition that the c;’s must satisfy, so that U, is an unbi- 
ased estimate of ju. 
(ii) Show that the sample mean X is the unbiased linear estimate of 
u with the smallest variance (among all unbiased linear estimates 


of yu). 


Hint: For part (ii), one has to minimize the expression );_, c? 


i=1 C Subject 
to the side restriction that );_,c, = 1. For this minimization, use 
the Lagrange multipliers method, which calls for the minimization of 
the function $(¢1, ..., Cn) = iL, C? + AQT, ci — 1) with respect to 
Ci, ---, Cn, Where i is a constant (Lagrange multiplier). Alternatively, 
one may employ a geometric argument to the same effect. 
In all of the following Exercises 3.11-3.15, employ steps (i)-(iv) 
listed after Remark 11 in an attempt to determine UMVU estimates. 


Use relation (10) whenever possible. 


3.11 If Xj, ..., Xn is a random sample from the Poisson distribution P(0), 
show that the sample mean X is the UMVU estimate of 0. 


3.12 Let Xj), ..., Xn be iid. r.v's distributed as N(u, 07). 
G) If u = 0 € Q= hand o is known, show that X is the UMVU estimate 

of 6. 
(ii) Ifo? =0 € Q = (0, 00) and y is known show that S? = 1 


n 
n Li=l (Xx i 
uw} is the UMVU estimate of o?. 





Hint: For part (ii), recall that Y = `; (ELY ~ x2, and hence 
EY =n, Varo(Y) = 2n. 


3.13 Let Xj, ..., Xn bei.i.d. r.v.'s from the Gamma distribution with parameters 
a known and £ = 8 € Q = (0, co) unknown. 
(i) Determine the Fisher information I (0). 
(ii) Show thatthe estimate U = U (X1, ..., Xn) = 4 X1 Xi is unbiased 
and calculate its variance. 
(iii) Show that Varg(U) = 1/nI(@), so that U is the UMVU estimate 
of 6. 


3.14 Let X),..., Xn be iid. r.v.'s from the Negative Exponential p.d.f. in the 
following parametric form: f(x; 0) = ze, x>0, 0 € Q = (0, 00), so 
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that E,X, = 0. Use Exercise 3.13 in order to show that X is the UMVU 
estimate of 0. 


Hint: Recall that the Negative Exponential distribution is a special 
case of the Gamma distribution. 


3.15 Let X be a r.v. with p.df. f(x; 0) = ge, x e R, 6 € Q = (0, 00). 
Then: 
(i) Show that E,|X| = 0, and EX? = 20?. 

(ii) Show that the statistic U=U(X,,..., Xn) = + X; |Xi| is an unbi- 
ased estimate of 0, and calculate its variance. ; 

(iii) Show that the Fisher information number /(0) = -Eol log f(X; 
0)] = 1/6. 

(iv) Conclude that U is the UMVU estimate of 0. 

In Exercises 3.16-3.20, the purpose is to construct the UMVU estimates of 

the parameters involved by invoking Theorem 7. Sufficiency can always 

be established through Theorem 3; completeness sometimes will be es- 

tablished, but it will always be assumed when appropriate. 


3.16 I1£X;,..., Xn are independent r.v.'s distributed as B(1, 0), 0 € Q = (0, 1), 
then: 
(i) Show that T = } `; X; is sufficient for 0. 
Gi) Also, show that T is complete. 
(iii) From parts (i) and (ii), conclude that X is the UMVU estimate of 6. 


3.17 Let X;,..., Xn be a random sample of size n from the P(@) distribution, 
0 € Q = (0,00). With T = >;_, X; it has been seen that the condi- 
tional distribution Py(X; = x | T = t) is BG, L) (see Exercise 2.10(ii) in 
Chapter 5) and that T is sufficient for 6 (see Exercise 2.9 here). Show 
that T is complete, so that the conclusion reached in Example 17 will be 
fully justified. 


3.18 Let the independent r.v.'s X;,..., Xn have the Geometric p.d.f. f(x;0) = 
901-0%! x=1,2,..., 0 € Q= (0, 1). 
G) Show that X is both sufficient and complete. 

(ii) Show that the estimate U defined by: U(X) = 1 if X = 1, and 
U(X ) = 0 if X = 0, is an unbiased estimate of 8. 

Gii) Conclude that U is the UMVU estimate of 6 and also an entirely 
unreasonable estimate. 

(iv) Prove that the variance of U is uniformly bigger than the Cramér- 
Rao lower bound. (So, on account of this, the Cramér-Rao inequality 
could not produce the UMVU estimate.) 


Remark: We have stipulated that an estimate always takes values 
in the appropriate parameter Q. In order to be consistent with this 
stipulation, we take Q = [0, 1] in part (ii). 


3.19 Let X;,..., Xn be independent r.v.'s with the Negative Exponential p.d.f. 
f(a; 0) = je", x > 0, 0 € Q= (0, 00). Then: 


270 Chapter 9 Point Estimation 


(i) X is sufficient (and complete, although completeness will not be es- 
tablished here). 
(11) X is the UMVU estimate of 6. 


3.20 Let X,,..., Xn be independent r.v.'s distributed as U(0,0) 0 € Q = 
(0, co). Then: 
(i) The largest order statistic Y, of the X;’s is sufficient (and complete, 
although completeness will not be established here). 
(ii) The unbiased estimate U = s Yn is the UMVU estimate of 0. 
Gii) Explain why the Cramér-Rao inequality approach is not applicable 
here. 


i 9.4 Decision-Theoretic Approach to Estimation 


In this section, a brief discussion is presented of still another approach to 
parameter estimation, which, unlike the previous two approaches, is kind of 
penalty driven. In order to introduce the relevant concepts and notation, let 
X1,..., Xn be a random sample with p.d.f. fC; 0), 0 € Q C MR, and let ô bea 
function defined on NR” into Q; i.e., $: R” > Q. Ifa, ..., Xy are the observed 
values of X1, ..., Xn, then the value 5(%, ..., Xy) is the proposed estimate of 0. 
The quality of this estimate is usually measured by its squared distance from the 
estimated quantity 6; thatis, [0—8 (æ, ..., £n)]?. Denote it by L[9; (a, ..., Ln)] 
and call it a loss function. So L[@; 6(a, ..., 2nJ] = [0 — 6(a%, ..., Lp)]*. The 
closer the estimate 3(x1, ..., &n) is to 0 (on either side of it) the smaller is the 
loss we suffer, and vice versa. The objective here is to select 6 in some optimal 
way to be discussed below. The first step to this effect is that 5 be selected 
so that it minimizes the average loss we suffer by using this estimate. For this 
purpose, consider the r.v. L[6; 5(X1, ..., Xn)] = [8 — 8(X,,..., Xn)]? and take 
its expectation to be denoted by R(6; 5); namely, 


RO; 8) = EolO — 8(X), ..., XP? 


Pro SEO — SC, «+, EDPS (E O)- +» Fn; Oday - +: din, 
for the continuous case, (12) 
Da Lx, 108%, ..., OPS 0) > fn; 9), 


for the discrete case. 





The average loss R(6; 5) is called the risk function, corresponding to ô. The 
value R(0; 5) is the average loss suffered corresponding to the point 6, when 
6 is used. At this point, there are two options available to us in pursuing the 
issue of selecting ô. One is to choose ô, so as to minimize the worst which 
can happen to us. More formally, choose ô so that, for any other estimate 6*, it 
holds that: 


sup[R(9; 8), 0 € Q] < sup[R(9; 8%), 6 €Q]. 


Such an estimate, ifit exists, is called minimax (by the fact that it minimizes the 
maximum risk). The second option would be to average R(0; ô), with respect 
to 0, and then choose ô to minimize this average. The implementation of this 
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plan goes as follows. Let 1(0) be a p.d.f. on Q and average R(0; 5) by using this 
p.d.f.; let r(8) be the resulting average. Thus, 


Jo RO; 5)A(@) d6, for the continuous case, 
r(6) = E,R(0; 6) = (13) 
ecg RO; 5)A(@), for the discrete case. 
Then select 6 to minimize r(8). Such an estimate is called a Bayes estimate, 
corresponding to the p.d.f. à, and it may be denoted by ô}. In this context, 
the parameter 0 is interpreted as a r.v. taking values in Q according to the 
p.d.f. 1(0), which is called a prior or a priori p.d.f. It so happens that, un- 
der minimal assumptions, a Bayes estimate always exists and is given by an 
explicit formula. 





Suppose 0 is a r.v. of the continuous type with prior p.d.f. 1(0), and 
that the three quantities fo f(m1;0)--- f(an;9)A(@) dd, fo Ofa; 0) x 
=== f(%y; )A(0) dO, and La 6° f(a; 0)--- F(%a; 0)A() dO are finite. Then 
the Bayes estimate corresponding to 1(0), ô}, is given by the expression 
So 9F 1; 0): - Fn; DA(0) dO 
fa FQ; 9) + SEn 00) dO * 
If O is a discrete r.v., all integrals above are to be replaced by summation 
signs. 


(14) 





9, (%1, IS Ln) = 








This theorem has the following corollary. 


COROLLARY The Bayes estimate ô, (%, ..., £n) defined in relation (14) can 
also be calculated thus: 


Mii) = / ORO | Xi, ..., Up) dO, (15) 
Q 


where h(9 | xi, ..., £n) isthe conditional p.d.f. of 6, given X; =%;,,1=1,...,n, 
which is also called the posterior p.d.f. The integral is to be replaced by a 
summation sign in the discrete case. 


PROOF Observe that f(%; 0)... f(@y; 0) is, actually, the joint conditional 
p.d.f. of X1,..., Xn, given 6, so that the product f(a, 0)... f(%n; 0)A(O) is the 
joint p.d.f. of X1, ..., Xn and 0. Then its integral over Q is the marginal (joint) 
p.d.f. of X;,..., Xn, and therefore 


Faas 9)... Fo OAO [ Fans 0)--> f (ary; 220) 48 


is h(@ | %,..., £n) as described above. Then expression (14) completes the 
proof. A 
So, in computing ô, (7%, ..., Xn), one may use relation (14) or, alternatively, 


first calculate h(@ | a, ..., £n) and then apply formula (15). 
We now proceed with the justification of (14). 
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PROOF OF THEOREM 8 All operations below are valid without any further 
explanation. The derivations are carried out for the continuous case; in the 


discrete case, the integrals are to be replaced by summation signs. By rela- 
tion (13), 


165)= | RG: anode [| [7 fT 0- ae... mor 


S (Hi; A) +++ Fn; PAN - - iy} (0) 0 


= / oe f If [9 = a. PAOD 0): + Fi 0940) 
x0dx...dX,. 


Then, in order to minimize 7(6), it suffices to minimize the inner integral for 
each z, ..., Xp. However, 


[ [9 — 8n ... ta) PAO)S Ex 0)... f (ns 848 
= 800,2) | i Fen). fni Dao ae | 
Q 
~ 28 (at, ..., Lr) | 1 Af (a; 8)... f ea DAOA 
Q 


n | J PEO $e eynoyas | , 
2 
and this is of the form: g(t) = at? — 2bt + c, where 


a= L EEE E O 
pa f Af (ar; 0): >> FC OAOA, 
Q 


ex f 92 f(a; 8)--- f (En; AO), 
Q 
and 
t= d(%,..., Un). 


The quadratic expression g(t) = at? — 2bt + c is minimized for t = 2, since 
g (1) = 2at — 2b = 0 gives t = 4 and g’(t) = 2a > 0. But ? is equal to the 
right-hand side in expression (14). The proofis complete. A 


REMARK 13 Inthe context of the present section, the function ô and the 


estimate $(x1,..., Xn) are also referred to as a decision function and a deci- 
sion (associated with the specific outcome x;,..., Xn). Hence the title of the 
section. 


REMARK 14 The Bayes approach presents us with both advantages and 
disadvantages. An issue which often arises is how the prior p.d.f. 1(0) is to 
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be chosen. People have given various considerations in selecting 1(0), includ- 
ing mathematical convenience. Perhaps the most significant advantage of this 
approach is that, in selecting the prior 1(0), we have flexibility in incorporating 
whatever information we may have about the parameter 0. 
The theorem is illustrated with one example. 

Let Xj, ..., Xn be a random sample from the B(1, 0) distribution, 9 € Q = 
(0, 1), and choose 1(0) to be the so-called Beta density with parameters a and 
B; that is, 





T@+B) pa-l] _ g)b-1 ; 
16) = nora? a-* ifoec(0, 1) (16) 
0, otherwise. 


(For a proof that à is, indeed, p.d.f., see, e.g., pages 70-71 in the book 
A Course in Mathematical Statistics, 2nd edition (1997), Academic Press, by 
G. G. Roussas.) Then the Bayes estimate is given by relation (20) below. 


DISCUSSION Now, from the definition of the p.d.f. of a Beta distribution 
with parameters a and £, we have 


1 = = I (0)1 (8) 
al B-1 
/ od — ay da = ( y (17) 


and, of course, (y) = (y — Ir (y — 1). Then, for simplicity, writing >> jij 
rather than Da 1 Xj when this last expression appears as an exponent, we have 





h= [ro 0) --- f&n; OJA(O) d0 
F(o + B) : YN Dj n—} ¡2jga—1 p-1 
= ¡(1-0 ¡50 1-0 de 
aora) °° C7? “se 
= GaP) g@thsa)-1q — pCt), 


rT CE) Jo 


which, by means of (17), becomes as follows: 


_ID@+p) Mea) +n- Ej) 








1 Erg) T(a+8+nm) id 
Next, 
p= / Af (a; 0)--> Fr; OAO) de 
Q 
_ T@+s) f' Ly n-E%ga—1 p-1 
= Fare BEIA — e) 61 — 0)f-td0 
I+D f great) — gy6+e-Lj;4)-1g9_ 
For) Jo 
Once more, relation (17) gives 
T@+p) Pt hay + YP (840 21 05) 
= (19) 





~ T@ré) ~ Tatpint) 
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Relations (18) and (19) imply, by virtue of (14), 
Pla + B4+ DT (a+) tjt 1) _ a+ jar Y. 





Ó(XL, +... Un) = = : 
a ds TMo+PBrn+ DO (+Y) a+Bg+n 
that is, 
jr Bj +0 
ô paa === 20 
(i, --., Xn) ureri (20) 


REMARK 15 When a = f = 1, the Beta distribution becomes U (0, 1), as 
follows from (16), since r (2) = 1 x T(1) = 1. In this case, the corresponding 
Bayes estimate is $(%,..., Ln) = Q; vi + 1)/m + 2). 

A minimax estimate is usually found indirectly, by showing that a Bayes 
estimate is also minimax. The following theorem tells the story. 





Let ô (a, ..., Xn) be the Bayes estimate corresponding to the prior p.d.f. 
(0), and suppose its risk R(0; ô), as given in (12), is independent of 
0 € Q. Then 6,(%,, ..., Xn) is minimax. 











PROOF The justification is straightforward and goes like this. Set R(6; ô) = 
c, and let $* = $*(x,,..., Xn) be any other estimate. Then 


sup[R(0; 8,) 0 € Q] = c = / cr(6) dé 
2 


= / RO; 8,J)A(0)d0 < / R(0,501(0)d0 (since ô, is Bayes) 
Q Q 


< sup[R(0;8*),0 € Q] (since A is a p.d.f. on Q). 


This completes the proof. A 


The following example illustrates this theorem. 


Let X1, ..., Xn and A(@) be as in Example 19. Then the corresponding Bayes 
estimate ô is given by (20), and the estimate ô* given in (21) is minimax. 


DISCUSSION By setting X = > ;_, X; and taking into consideration that 
EX = n8 and EX? = no(1 — 6 + n8), we obtain 


_ X+a Y 

R(0; 8) = Ey (6 — 5) 
yee bo A 2_ 2 2 
EAT O eee 


By taking a = $ = i n and denoting by ô* the resulting estimate, we have 


(a+ pBé%=n=0, 2a7+2aB—n=0, 
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so that 


a? n 1 


R(0; 8%) = = = ; 
M+a+ BP (n+ ym) 4+? 
Since R(6; 8*) is independent of 9, Theorem 9 implies that 
Diet avn 2/ne+1 
n+/n 2(1+ Vn) 








6° (4%, ..., Un) = 


(21) 


is minimax. 


4.1 Consider one observation from the p.d.f. f(x; 0) = (1-00, x = 
1,2,...,06 € Q = (0, 1), and let the prior p.d.f. 4 on (0, 1) be the U(0, 1) 
distribution. Then, determine: 

(i) The posterior p.d.f. of 0, given X = x. 
(ii) The Bayes estimate of 6, by using relation (15). 


4.2 If the rv. X has the Beta distribution with parameters a and £; i.e., its 
p.d.f. is given by expression (16), then without integration and by using the 
recursive property of the Gamma function T (y) = (y- DIG —1), y > 1), 
show that EX = a/(a + £). 


4.3 In reference to Example 19: 

(i) Show that the marginal p.d.f., h(x,, ..., Xy), defined by h(a, ..., Ln) = 
Jo Sa; 9) >> fn; AO) dO with f(a; 0) = 01-09), x= 0,1, 
and 1(0) as in relation (16), is given by: 

ræ + Ba +r +n- E) 
Mia) = > 
POB) +g +t) 
where t = 4% +---+%,. Do it without, actually, carrying out any 
integrations, by taking notice of the form of a Beta p.d.f. 

(ii) Show that the posterior p.d.f. of 0, given X; = %,..., Xn = Un, 
N(0 | x, ..., £n), is the Beta p.d.f. with parameters a + t and p +n- t. 

(iii) Use the posterior p.d.f. obtained in part (ii) in order to rederive the 
Bayes estimate 6(%, ..., £n) given in (20) by utilizing relation (15). Do 
it without carrying out any integrations, by using Exercise 4.2. 

(iv) Construct a 100(1 — a)% Bayes confidence interval for 9; that is, de- 
termine a set [9 € (0,1); RO | m,...,%) > C(X], ..., Xn)), where 
C(%, ..., Xp) is determined by the requirement that the P, -probability 
of this set is equal to 1 — a. 





4.4 Let X;,..., Xn be independent r.v.’s from the N(@, 1) distribution, 0 € 
Q = R, and on KR, consider the p.d.f. A to be that of N(u, 1) with y 
known. Then show that the Bayes estimate of 6, 5, (%1, ..., £n), is given 


by: 5(%4,...,%) = ate. 
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Hint: By (14), we have to find suitable expressions for the integrals: 


me ás 1 1 9 1 (0 — py? 
rj TTT] 320 o|» em 2 Jas, 


fa] bee ot 1 oe) 
h=[ o | 3 2 i |x 2 do. 


The integrand of J is equal to (the constant for the integration): 


1 1 = 2 2 
ay eo[-3( is ) 


(glo +10? — (nī + 108) l 




















1 
x 
J 2m 


However, 





(nt Le? — (n7 + JO = (n+ 1) (« o) 








+1 
NX + u A NX + u 
= 1 0 
(ar E] 
NX 2 = 
(9 — Ar) (Mi + y 





IVmn n+l” 
so that 


E exp{—5lin+ De? — 2(n%+ 108) 


- a oa} x Ba eee eee ++ 
Vn+1 2 n+1 V2r(1/4n+ 1) 2/(/n+ D2 |’ 





and the second factor is the p.d.f. of N(= 





—— ). Therefore the inte- 





+1”? a 
gration produces the constant: i ” 
1 1 M4 py 
x Lı FH ; 
Vn+1 Jay al ¿[Es ntl 


Likewise, the integrand in J is rewritten thus: 


a as | PS al 





n+1 


1 > (0 — Ete)” 


A A| AFTE |’ 





and the cond factor, when integrated with respect to 6, is the mean 
of N (H, zH) distribution, which is “**. Dividing then h by I, we 
obtain the desired result. 
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4.5 Refer to Exercise 4.4, and: 
(i) By utilizing the derivations in the hint, derive the posterior p.d.f. 
hO | %,..., Xp). 
Gi) Construct a 100(1 — a )% Bayes confidence interval for 6 as in Exercise 
4.3(iv). 


4.6 Let Xj, ..., Xn be independent r.v.’s distributed as P(0), 6 € Q = (0, ow), 
and consider the estimate ô (2, ..., Xn) = X and the loss function L(6; 5) = 
[0 —6(m, ..., %n)]7/0. 
Calculate the risk R(0; $) = i Ey LIO — 6(X,..., Xn)]?, and use 
Theorem 9 in order to conclude that the estimate 6(%,...,%,) = Tis, 
actually, minimax. 


I 9.5 Other Methods of Estimation 


In addition to the methods of estimation discussed so far, there are also other 
methods and approaches, such as the so-called minimum Chi-Square method, 
the method of least squares, and the method of moments. The method of least 
squares is usually associated with the so-called linear models, and therefore 
we defer its discussion to a later chapter (see Chapter 13). Here, we are going 
to present only a brief outline of the method of moments, and illustrate it with 
three examples. 

To this end, let X;, ..., Xn be a random sample with p.d.f. £(;0) 082€ 
R, and suppose that E¿X, = m (0) is finite. The objective is to estimate 6 by 
means of the random sample at hand. By the WLLN, 


n 

2 YN X=X, 3 m6). (22) 
Therefore, for large n, it would make sense to set X,, = m (0) (since it will 
be approximately so with probability as close to 1 as one desires), and make 
an attempt to solve for 0. Assuming that this can be done and that there is a 
unique solution, we declare that solution as the moment estimate of 0. 

This methodology applies in principle also in the case that there are r 
parameters involved, 61, ..., 0,, or, as we say, when 0 has r coordinates, r > 1. 
In such a case, we have to assume that the r first moments of the X;'s are finite; 
that is, 


EoXt = MO, ...,0,) ER, k=1,...,% O=(H,..., 4). 


Then form the first r sample moments 1 er xt k = 1,...,r, and equate 
them to the corresponding (population) moments; that is, 


1 n 
= POX = ml, «+++ 8); k=1,...,1. (23) 
i=l 


The reasoning for doing this is the same as the one explained above in con- 
junction with (22). Assuming that we can solve for 61, ..., 6, in (23), and that 
the solutions are unique, we arrive at what we call the moment estimates of 
the parameters 61, ..., 0,. 
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The following examples should help shed some light on the above expo- 
sition. 


On the basis of the random sample X), ..., Xn from the B(1, 0) distribution, 
find the moment estimate of 6. 


DISCUSSION Here E,X, = 0 and there is only one parameter. Thus, it 
suffices to set X = 6, so that the moment estimate of 6 is the same as the 
MLE and the UMVU estimate, but (slightly) different from the Bayes (and the 
minimax estimate). 


On the basis of the random sample X;,..., X, from the N(y, 0?) distribution 
with both uu and o? unknown, determine the moment estimates of u and o?. 


DISCUSSION The conditions referred to above are satisfied here, and, 
specifically, EyX¡ =p, EoX; = =0?+ y?, 0 = (u, 07). Here we need the ae two 
Ep moments, X and + 151 X?. We nave a X = wand 4 TD = 
0? + uw”. Hence w= Xando?=1 Y", x =Q e 
Y 1 (X; — X. Thus, the moment estimates are ñ=Xandó*=1 7% Xi- 
X Y. The estimate ñ is identical with the MLE and the UMVU estimate, whereas 
6” is the same as the MLE, but (slightly) different from the UMVU estimate. 


Let the random sample Xj, ..., Xn be from the U(a, 6) distribution, where 
both a and £ are unknown. Determine their moment estimates. 


DISCUSSION Recall that EgX, = “8 ando¿(X)= E By = (a, B), so 


that: 
= a+ 2_ a a+ py 
X= o” > X +( a 9 











Hence -2% -p =15 X?P-X? = 1% (XX Y, callit S°. Thus, B-+a = 2X 
and (a — “BY = = “1282, or f — a = = 2843, so that the moment estimates of a 
and £ are: @ =X-— S/3, B = X +58y3. These estimates are entirely different 
from the MLE’s of these parameters. 


5.1 Refer to Exercise 1.6, and derive the moment estimate of 9. Also, compare 
it with the MLE 6 = 1/X. 


5.2 (i) Refer to Exercise 1.7, and derive the moment estimate of 0, 0. 
(ii) Find the numerical values of 9 and of the MLE 6 (see Exercise 1.7), 
if: n= 10 and: 


xı = 0.92, x2=0.79, «3=0.90, x24=0.65, xs = 0.86, 
=0.47, 27 =0.73, 2% =0.97, x9=0.94, and zo =0.77. 
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5.3 Refer to Exercise 1.8, and derive the moment estimate of 6. 


5.4 Refer to Exercise 1.9 and show that Eọ|X| = 0, and therefore the moment 
estimate of 0 is 6 = +“, |X;. 


n 


5.5 Refer to Exercise 1.10 and find the expectation of the given p.d.f. by 
recalling that, if X ~ Gamma with parameters a and £, then EX = af 
(and Var(X ) = wf”). Then derive the moment estimate of 9, and compare 
it with the MLE found in Exercise 1.10(i). 


5.6 Refer to Exercise 1.11, and: 
(i) Show that EX = a+ B and EX? = a? + 208 + 287, where X is 
a r.v. with the p.d.f. given in the exercise cited. Also, calculate the 
Var(X). 
(ii) Derive the moment estimates of a and $£. 


5.7 Let Xi, ..., Xn be independent r.v.'s from the U(6 — a, 6 + b) distribution, 
where a and b are positive constants and 0 € Q = Rh. 
Determine the moment estimate 6 of 6, and compute its expectation 
and variance. 


5.8 If the independent r.v.’s Xj, ..., Xn have the U(-0, 0) distribution, O € 
Q = (0, co), how can one construct a moment estimate of 0? 


5.9 If the independent r.v.’s X;,..., Xn have the Gamma distribution with 
parameters a and £, show that the moment estimates of a and £ are: 
a = X*/S? and $ = S?/X, where S? = 1 5Y% (X; — XX? 


Hint: Recall that, if X ~ Gamma with parameters a and £, then EX = 
aß, Var(X) = aB?. 


5.10 Let X be a r.v. with p.d.f. f(x; 0) = (0 — x), 0<x<0,0 € Q= (0, 00). 
Then: 
(i) Show that fC; 0) is, indeed, a p.d.f. 
(ii) Show that E,X = § and Var(X) = E. 
(iii) On the basis of a random sample of size n from f(-; 0), find the 
moment estimate of 0, 6, and show that it is unbiased. Also, calculate 


the variance of Õ. 


5.11 Let X be a r.v. having the Beta p.d.f. with parameters a and £; i.e., 
Sua B)= EGA- a, 0<a<1 (a, 2> 0). Then, by Exer- 
cise 4.2, EX =a/(a+ B). 
(i) Follow the same approach used in proving Exercise 4.2 in order to 
establish that EX? = a(a + 1)/(a + B\a+ 8 + 1). 
(ii) On the basis of a random sample of size n from the underlying p.d.f., 
determine the moment estimates of w and £. 





5.12 Let X and Y be any two r.v.'s with finite second moments, so that their cor- 
relation coefficient, p(X, Y), is given by p(X, Y) = Cov(X, Y )/o (X )a(Y ). 
Let X; and Y;, i = 1,...,nbe iid. r.v.'s distributed as the r.v.'s X and 


280 Chapter 9 Point Estimation 


Y, respectively. From the expression, Cov(X, Y)=E[(X- EX) x 
(Y — EY)], it makes sense to estimate p(X, Y ) by 6n(X, Y ) give by: 


A 1 mMm E e . . 
Ae YX - HM - Pê), 
i=1 


12 %-2? and 6(¥) = 
i=1 





Then set EX = u, EY = pz, Var(X) = of, Var(Y) = 0%, p(X, Y) = p, 
and show that: 
© 22,0; - 2) -—Y)= 17,06 Y) — XY. 

Gi) E(XY) = oo Cov(X, Y) + 1 = poio + pino. 

(iii) Use the WLLN (Theorem 3 in Chapter 7) in conjunction with the 
Corollary to Theorem 5 in Chapter 7 in order to show that 6,(X, Y ) 
> p(X, Y) = p, so that P,(X, Y) is consistent (in the probability 
sense) estimate of p. 

(Notice that 6,(X, Y ) is the same as the MLE of p for the case that 
the pair (X, Y) has the Bivariate Normal distribution; see Exercise 


1.14 in this chapter.) 


5.13 (i) For any n pairs of real numbers (a;, fi), i = 1,...,n, show that: 


Oca ub = Oy 0 Bi) 


Hint: One way of proving it is to consider the function in 4, g(A) = 
(ai — AB;)”, and observe that g(A) > 0 for all real à, and, in partic- 
ular, for A = O, %80/0-;-1 Bi), which is actually the minimizing 
value for g(A). 


(ii) Use part (i) in order to show that [f,(X, Y)? < 1. 


5.14 Inreference to Example 25 in Chapter 1, denote by x; and y;, i = 1, ..., 15, 
respectively, the observed measurements for the cross-fertilized and the 
self-fertilized pairs. Then calculate the (observed) sample means %, Y, 
sample variances s, sz, and the sample s.d.’s Sy, Sy. 
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Confidence Intervals 
and Confidence 
Regions 


In Section 2 of Chapter 8, the basic concepts about confidence intervals etc. 
were introduced; the detailed discussion was deferred to the present chapter. 
The point estimation problem, in its simplest form, discussed extensively in 
the previous chapter, is as follows: On the basis of a random sample Xj, ..., Xn 
with p.d.f. fC; 6), 0 € Q C R, and its observed values 2, ..., £n, construct a 
point estimate of 0, call it ô = 6(a, ..., xy). Thus, for example, in the N(9, 1) 
case, we are invited to pinpoint a value of 6 e K as the (unknown to us but) 
true value of 6. Such estimates were, actually, constructed by way of at least 
three methods. Also, certain desirable properties of estimates (fixed sample 
size properties, as well as asymptotic properties) were established or stated. 

Now, declaring that (the unknown value of) 0 is, actually, x may look 
quite unreasonable. How is it possible to single out one value out of Nt, %, 
and identify it as the true value of 6? The concept of a confidence interval 
with a given confidence coefficient mitigates this seemingly unreasonable sit- 
uation. It makes much more sense to declare that 6 lies within an interval in 
R with high confidence. This is, in effect, what we are doing in this chapter 
by formulating the questions and problems in rigorous probabilistic/statistical 
terms. 

The chapter consists of four sections. The first section concerns itself with 
confidence intervals of one real-valued parameter. The following section con- 
siders the same kind of a problem when nuisance (unknown but of no interest 
to us) parameters are present. In the third section, an example is discussed, 
where a confidence region of two parameters is constructed; no general theory 
is developed. (See, however, Theorem 4 in Chapter 12.) In the final section, 
some confidence intervals are constructed with given approximate confidence 
coefficient. 


281 


282 Chapter 10 Confidence Intervals and Confidence Regions 


i 10.1 Confidence Intervals 


We formalize in the form of a definition some concepts already introduced in 
the second section of Chapter 8. 


DEFINITION 1 
Let X1, ..., Xn be a random sample with p.d.f. f(-; 0), 6 € 2 C R. Then: 


(i) A random interval is an interval whose end-points are r.v.’s. 

(ii) A confidence interval for 9 with confidence coefficient 1 — œ (0 < 
a < 1, a small) is a random interval whose end-points are statistics 
L(X;,..., Xn) and U(X, ..., Xn), say, such that L(Xj,..., Xn) < 
U(X, ..., Xn) and 


Py[L(X, ..., Xn) <0 < U(X, ..., XÐ] >1—a, forall0 eQ. 
a) 


(iii) The statistic L(X;,..., Xn) is called a lower confidence limit for 0 
with confidence coefficient 1 — a, if the interval [L(X;, ..., Xn), 00) 
is a confidence interval for 6 with confidence coefficient 1 — a. 
Likewise, U (X1, ..., Xn) is said to be an upper confidence limit for 0 
with confidence coefficient 1—a, ifthe interval (—oo, U(X), ..., Xn)] 
is a confidence interval for 9 with confidence coefficient 1 — a. 


REMARK 1 The significance of a confidence interval stems from the rel- 
ative frequency interpretation of probability. Thus, on the basis of the ob- 
served values 2, ..., £n of Xj, ..., Xn, construct the interval with end-points 
L(%,,..., £n) and U (3, ..., £n), and denote it by [Z;, U1]. Repeat the underly- 
ing random experiment independently another n times and likewise form the 
interval [L2, U2]. Repeat this process a large number of times N independently 
each time, and let [L y, Uy] be the corresponding interval. Then the fact that 
[L(X1,..., Xp) U(X, ..., Xn)] is a confidence interval for 9 with confidence 
coefficient 1—a means that approximately 100(1 — «œ )% of the above N intervals 
will cover 6, no matter what its value is. 


REMARK 2 When the underlying r.v.’s are of the continuous type, the in- 
equalities in the above definition, regarding the confidence coefficient 1 — a, 
become equalities. 


REMARK 3 If L(X,,..., Xn) is a lower confidence limit for 6 with confi- 
dence coefficient 1 — 5, and U(X, ..., Xn) is an upper confidence limit for 
6 with confidence coefficient 1 — 5, then [L(X,..., Xn), U(X, ..., Xn)] isa 
confidence interval for 9 with confidence coefficient 1 — a. 
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Indeed, writing L and U instead of L(X;,..., Xn) and U(X;,..., Xn), and 
keeping in mind that L < U, we have: 


Py(L <0)= PL <0,U > 0) 4 Pr(L<0, U <0) 
= Pa(L <0 < U)+ P,(U <9), since (U <0)C(L<O0), 
and 
Py(0 < U) = PU > 0, L<0)+Po(U > 0, L> 0) 
= P(L <0 <U)+Po(L > 0), since (L > 0)€ (U > 0). 
Summing them up, we have then 
PL < 0) + RU > 0) =2P,(L < 0 < U) + RU <9)+ PL > 0), 
or 
2P9(L < 0 <U)=Po(L < 0) + Po(U > 0) — P (U <0)— Po(L > 0) 
= Pr(L < 0)+ PA(U > 0)— 1 + PRU >0)-14+ PL <6) 


= 2[P,(L < 0) + Po(U > 0) — 11, 
or 
Po(L <0<U)= FPo(L <0)+ Po(U > 0) 1 
a 


a 
> ] 1 d=:.1 
= at 2 a, 





as was to be seen. 

This section is concluded with the construction of confidence intervals in 
some concrete examples. In so doing, we draw heavily on distribution theory 
and point estimates. It would be, perhaps, helpful to outline the steps we 
usually follow in constructing a confidence interval. 


(a) Think of a r.v. which contains the parameter 0, the r.v.’s X1, ..., Xn, prefer- 
ably in the form of a sufficient statistic, and whose distribution is (exactly 
or at least approximately) known. 

(b) Determine suitable points a < b such that the r.v. in step (a) lies in [a, b] 
with P,-probability > 1 — a. 

(c) In the expression of step (b), rearrange the terms to arrive at an interval 
with the end-points being statistics and containing 0. 

(d) The interval in step (c) is the required confidence interval. 


Let X,,..., Xn be a random interval from the N(j, 0?) distribution, where 
only one of u or o? is unknown. Construct a confidence interval for it with 
confidence coefficient 1 — a. 


DISCUSSION 


(i) Let u be unknown. The natural r.v. to think of is yn(X—p)/0, which 
satisfies the requirements in step (a). Next, determine any two points a < b 
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from the Normal tables for which P(a < Z < b) = 1—a where Z ~ 
N(0, 1). (See, however, Exercise 1.1 for the best choice of a and b.) Since 
Vi) N(0, 1), it follows that 

Y= 

Pas Vee a b =1-0, forall y, 

Oo 
so that step (b) is satisfied. Rearranging the terms inside the square brack- 
ets, we obtain: 








r,(2- z 22% = “) =1-0, forall u, 
so that step (c) is fulfilled. In particular, for b = 24/2 (recall P(Z > 2/2) = 
5) and a = —Za/2, we have 
O: = O 
r,(2- 2a/2 75 <u< Etant) =1-a, forall y. 
It follows that 
E — 2y/2 a X + 2/2 =| =X+ nT (for brevity) (2) 


is the required confidence interval. 

(ii) Let o? be unknown. Set S? = 2% (Xi — Y and recall that ns = 
L AE“ ~ x2. The rv. “© satisfies the requirements of step (a). From 
the Chi-Square tables, determine any pair 0 < a < b for which P(a < X < 
b) = 1 — a, where X ~ x?. Then 


S? S? S? 
P, as Y <p =l-a, forallo?, or P, Degg pal 
y? b a 





=1-a, forall o? and steps (b) and (c) are satisfied. 
In particular, 


s? s? 
Pal <o <” ) -1-a, for all o°, 


2 = 72 
Xn; «/2 Xn; 1—a/2 








where P(X < x31 4/2) = P(X > Xi. 4/2) = $. It follows that 


2 2 n 
l a, | $=" 0% — uy 8) 
i=l 


E ye 
Xn;a/2 Xn; 1-a/2 
is the required confidence interval. 





Numerical Example Let n= 25 and 1 — a = 0.95. For part (i), we have 
2u/2 = 2.025 = 1.96, so that Š + Ral Ta = Š + 1.96 x 5 = X + 0.3920. For 
o = 1, for example, the required interval is then: X + 0.392. For the second 
part, we have xž«/2 = X35,0.025 = 40.646, and x7. 4/2 = X550.975 = 13.120. The 
required interval is then: 


2558? 258? 
40.646’ 13.120 














| ~ [0.6155?, 1.9058°]. 
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On the basis of the random sample X;,..., Xn from the U(0, 6) (8 > 0) distri- 
bution, construct a confidence interval for 6 with confidence coefficient 1 — a. 


DISCUSSION _ Ithas been seen (just apply Example 6(ii) in Chapter 9 with 
a = Qand $ = 0) that X = Xn is a sufficient statistic for 9. Also, the p.d.f. of X 
is given by (see Example 14 in Chapter 9) fx(x; 0) = a, 0 < x < 0.Setting 
Y = X/0, it is easily seen that the p.d.f. of Y is: fy(y) = ny™t, 0 < y < 1. The 
r.v. Y satisfies the requirements of step (a). Next, determine any 0 <a <b <1 
such that f? fraDdy = J? ny dy = fis dy” = y”? = b-a” = 1-a. 
Then 


X X X 
Pas ¥ <b)=P(as a <0) 1 7=0<%)=1-a for all 0, 
a 


so that steps (b) and (c) are satisfied. It follows that [£ a X] = eo Xm) j is the 
required confidence interval. 

Looking at the length of this interval, Xam — E), setting a = a(b) and 
minimizing with respect to b, we find that the shortest interval is taken for 
b=1anda=a!”. That is, [X@, žo], (See also Exercise 1.5.) 

Numerical Example Forn = 32and 1—« = 0.95, we get (approximately) 
[Xez, 1.098X 62]. 








1.1 Let 9 be the d.f. of the N(0, 1) distribution, and let a < b be such that 
(b) — P(a) = y, some fixed number with 0 < y < 1. Show that the 
length b — a of the interval (a, b) is minimum, if, for some c > 0,b=cC 
and a = -—c. 


1.2 If X;,..., Xn are independent r.v.’s distributed as N(u, o?) with y un- 
known and o known, then a 100(1 — w)% confidence interval for m is 
given by Xn + Za Ta (see Example 1(i)). Suppose that the length of this 
interval is 7.5 and we wish to halve it. What sample size m = m(n) will 
be needed? 





Hint: Set m = cn and determine c. 


1.3 The stray-load loss (in watts) for a certain type of induction motor, when 
the line current is held at 10 amps for a speed of 1,500 rpm, is a r.v. 
X ~ N(u, 9). 
(i) Compute a 99% confidence interval for y when n = 100 and Y = 58.3. 
Gi) Determine the sample size n, if the length of the 99% confidence 
interval is required to be 1. 


1.4 If the independent r.v.'s X1, ..., Xn are distributed as N(6, 0?) with o 
known, the 100(1 — a)% confidence interval for 0 is given by X,, + zz 7 
(see Example 1(i)). 
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(i) If the length of the confidence interval is to be equal to a preas- 
signed number l, determine the sample size n as a function of l, o, 
and a. 

(ii) Compute the numerical value of n, ifl = 0.1, o = 1, anda = 0.05. 


1.5 Refer to Example 2 and show that the shortest length of the confidence 
interval is, indeed, [X(n), X(n)/a'/”] as asserted. 


Hint: Set a = a(b), differentiate g(b) = 1 — E, with respect to b, 
and use the derivative of b” — a” = 1 — q in order to show that 


a) < 0,so that g(b) is decreasing. Conclude that g(b) is minimized for 


1.6 Let X,,..., Xn be independent r.v.'s with the Negative Exponential p.d.f. 
given in the form f(x; 0) = te~*/", æ > 0, 0 € Q = (0, 00). Then: 
(i) By using the m.g.f. approach, show that the r.v. U = »;_, X; has the 
Gamma distribution with parameters a = n and £ = 6. 
(ii) Also, show that the rv. V = 24 is distributed as x4, 
(iii) By means of part (ii), construct a confidence interval for 6 with 
confidence coefficient 1 — a. 


1.7 If X is a r.v. with the Negative Exponential p.d.f. f(x; 0) = pe 0 ¢> 
0, 0 € Q = (0, œ), then, by Exercise 2.2 in Chapter 9, the reliability 
R(x;0) = P(X > x) = e". If X1, ..., Xn is a random sample of size 
n from this p.d.f., use Exercise 1.6(iii) in order to construct a confidence 
interval for R(x; 0) with confidence coefficient 1 — a. 


1.8 Let X4, ..., Xn be a random sample of size n from the p.d.f. f(x; 0) = 
e) y>0,0 € Q = Ñ, and let Y, be the smallest order statistic of 
the X;’s. 

(i) Use formula (28) in order to show that the p.d.f. of Y}, call it g, is 
given by: g(y) = ne "4-0, y>0. 
(ii) Set T(0) = 2n(Y, — 0) and show that T ~ te 
(iii) Use part (ii) in order to show that a 100(1 — œ )% confidence interval 
for 6, based on T(0), is given by: [Y, — Z, Yı — 3,], for suitable 
0 <a < b; a special choice of a and bis: a = X21-8 and b = Xie 


1.9 Let the independent r.v.’s Xj, ..., Xn have the Weibull distribution with 
parameters y and 6 with 6 € Q = (0,00) and y > 0 known; i.e., their 
p.d.f. fC; 0) is given by: 


f(x; 0) = moe x >Q. 


G) Fori = 1,..., n, set Y; = XxX and show that the p.d.f. of Y;, g(-; 0), 
is Negative Exponential parameterized as follows: g(y; 0) = Ley de 
y> 0. 

(ii) For i = 1,...,n set T;(0) = 2h and show that the p.d.f. of 7;(@), 
gr(; 9), is that of a xe distributed r.v., and conclude that the rv. 
TO) = Vie FO) ~ Xw 
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(iii) Show that a 100(1 — w)% confidence interval for 6, based on T(0), is 
of the form [, 2), for suitable 0 < a < b, where Y = Y;_¡ X?. In 
particular, a and b may be chosen to be Xón Le and Xong , respectively. 


1.10 If the independent r.v.’s Xi, ..., Xn have p.d.f. f(x; 0) = ge", x e 
R, 0 € Q = (0, 00), then: 
(i) The independent r.v.’s Y, = |X;|, i = 1,...,n have the Negative 
Exponential p.d.f. g(y; 0) = je=W%, y> 0. 
(ii) The independent rv.’s 7;(0) = 4, i = 1, ..., n are x3-distributed, 
so that the rv. T(9) = Ni, GO = $Y = 2 ~ xiu where 
Y = Yi Yi = Lin Xil. 
(iii) A 100(1 — œ )% confidence interval for 0, based on T(@), is given by 
2X, 2], for suitable 0 < a < b. In particular, a and b may be chosen 
to be a = X31 2) D= Kong 
1.11 Consider the p.d.f. f(a; a, B) = es x>0a, a en, B > 0(see Ex- 
ercise 1.11 in Chapter 9), and suppose that 6 is known and « is unknown, 
and denote it by 9. Thus, we have here: 


1 
f;0)= Dl x>0, GEQ=N. 


(i) Show that the corresponding d.f., F(-; 0), is given by: F(x;0) = 
1—e 40/8, x > 6, so that 1 — F(x; 0) = 6-08, a > 0. 

(ii) Let Xj, ..., Xn be independent r.v.’s drawn from the p.d.f. f(-; 0), and 
let Y, be the smallest order statistic. Use relation (28) in Chapter 6 
in order to show that the p.d.f. of Y, is given by: 


frn; 0) = q P, y>0. 


(iii) Consider the r.v. T = T,,(0) defined by: T = n(Y; — 6)/6, and show 
that its p.d.f. is given by: fr(t) =e", t > 0. 


1.12 In reference to Exercise 1.11: 
(i) Determine 0 < a < b, so that P(a < T < b) = 1 — q, for some 

0<a<l. 

(ii) By part (i), Pa < ae < b] = 1 — a, since T has the p.d.f. 
fr(t) =e*, t > 0. Use this relation to conclude that [ Y, — e, Y, — 2) 
is a 100(1 — œ )% confidence interval of 6. 

(iii) The length l of the confidence interval in part (ii) is l = Fb — a). Set 
b = b(a) and show that the shortest confidence interval is given by: 
[Y, + 22, Y]. 


n 





Hint: For part (iii), set b = b(a), and from e~* — e? = 1 — a, obtain 
we = e’~* by differentiation. Then replace a in a and observe that it 


is >0. This implies that / obtain its minimum at a = 0. 
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1.13 Let Xj, ..., Xn be independent r.v.'s with d.f. F and p.d.f. f with f(x) > 0 
for —00 <a < x < b < oo, and let Y; and Y, be, respectively, the smallest 
and the largest order statistics of the X;’s. 
(i) By using the hint given below, show that the joint p.d.f., fy, y,, of Yı 
and Y, is given by: 


Fry (Ya, Yn) = NN — DIF Yn) — FDI" S UDS Yn), 


a< Yı < Yn <b. 


Hint: P(Y, < Yn) = PO < Yi, Yn < Yn) + PO > Y1, Yn < Yn) = 
Fy, Y, (Y1, Yn) + PO < Yı < Yn < Yn). But P(Y, < Yn) = Plall X's < 
Yn) = PX < Yn,---, Xn < Yn) = P(X1 < Yn)...P(Xn < Yn) = 
[F(y,)]", and: P(y1 < Y, < Y, < Yn) = P (all X;'s are >y; and also 
<Yn) = PY < Xi < Yn ---,Y1 < Xn < Yn) = PCY < Xi < Yn)... 
PCY < Xn < Yn) = [PQ < X1 < YY” = [F(Yn) — F(y))”. Thus, 


[FY] = Fr, y, (Y1, Yn) + FO) — Fl", A< Yi < Yn <b. 
Solving for Fy, y, (Y1, Yn) and taking the partial derivatives with respect 


to y, and Yn, we get the desired result. 


(ii) Find the p.d.f. fy, y, when the X;'s are distributed as U(0, 0), 0 € 
Q = (0, 00). 

(iii) Do the same for the case the X;’s have the Negative Exponential 
pat sj". x> 0, 0 €Q = (0, œ). 


1.14 Refer to Exercise 1.13(ii), and show that the p.d.f. ofthe range R = Yn— Yı 








is given by: 
fe(r; 0) = Mo Ding,  0<r<9. 
1.15 Referto Exercise 1.13(iii), and show that the p.d.f. ofthe range R = Yp— Yı 
is given by: 
Sil 8) == eho hr, rei 


1.16 In reference to Exercise 1.14: 
(i) Set T = R and show that fr(t) = n(n— DEA — t), 0< t< 1. 
(ii) Take 0 < c < 1 such that P(c < T < 1) = 1 — a, and construct 
a confidence interval for 6, based on the range R, with confidence 
coefficient 1 — a. Also, show that c is a root of the equation c”"! x 


[n — (n— Dc] =a. 


1.17 Consider the independent random samples Xj,..., Xm from the 
N(u1, of) distribution and Y;,..., Y, from the N(u2, 03) distribution, 
where /11, u2 are unknown and ož, of are known, and define the rv. 

— Am—Yn)-(141—H2) 


T= Tin, n(1 — u2) by: Timnk 41 u2) = CIm) . 
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Then show that: 

(i) A 1001 — a)% confidence interval for mı — ua, based on T, is given 
by: (Xm — Yn) — by 24+ %, (Xm — Pn) a & + %] for suitable 
constants a and b. 

(ii) The confidence interval in part (i) with the shortest length is taken 





for b = 2/2 and a = —2y/2. 
1.18 Refer to Exercise 1.17, and suppose a 11, a are known and of, o3 
are unknown. Then define the r.v. 7 = Tm nlo? / e)= = - z, where S$ = = 


LN (Xi-— m1)" and Si = 7 nj 1X- u2}, and show that a 100(1 — 00% 
confidence interval for o; 2/02, based on T, is given by (as, bs] for 
0<a<obwithPa<X< es =1l-a,X~ Frm In particular, we may 
choose a = Famli-2 and b = Frm; a. 


1.19 Consider the independent random samples Xj, ..., Xm and Yj,..., Yn 


from the Negative Exponential distributions f(x; 0,) = ie x > 


0, A € 2 = (0,00), and f(y; 02) = Ze", y > 0, & € QR = (0, 00), 








and set U = } i Xi, V = 05-1 Yj. Then, by Exercise 1.6(ii), on 
Xam ¿2 ~ y, and they are independent. It follows that iam = 2x5 = 
OIE F 

a RA Xi 2n,2m- 


Use this result i in order to construct a 100(1 — w)% confidence interval for 
01/02. 


| 10.2 Confidence Intervals in the Presence of Nuisance Parameters 


In Example 1, the position was adopted that only one of the parameters in the 
N(u, 0?) distribution was unknown. This is a rather artificial assumption as, 
in practice, both y and o? are most often unknown. What was done in that 
example did, however, pave the way to solving the problem here in its natural 
setting. 


L EXAMPIE3 Let X4, ..., Xn be a random sample from the N(u, 0?) distribution, where both 
2 : 2 : 


and o? are unknown. Construct confidence intervals for yu and o4, each with 
confidence coefficient 1 — a. 


DISCUSSION We have that: 


Xa =ijs? 2 : 
ise Sea ~ N(0,1) and Az (Es 2) oe ca 


i=1 


where S? = + );_,(X;— MY, and these two r.v.’s are independent. It follows 


that 





VESE — MÁ-p) 
S 


J (-DS82/02(n-1) 


~ tn_1. From the t-tables, determine any pair 
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(a, b) with a < b such that P(a < X < b) = 1 — a, where X ~ t,_,. It follows 
that: 


Pola vn) < J =1-u, forall@ =(u,0°), 


or 


Po(X-b- <usX-a=)=1-4, for all 0. 
n yn 


In particular, 
E e S 
Pa| X= a Ss SUS X+ haiei a =1- a, for all 0, 


where P(X > tn-1;a/2) = 3 (and X ~ ty_1). It follows that the required 
confidence interval for y is: 








E S S E S 
a ee ee X+tn-1/2—= (for brevity). 4 
| tn—1; nor + tni; a= ale Te (for brevity) (4) 


The construction of a confidence interval for o? in the presence of (an 
unknown) y is easier. We have already mentioned that aps? ~ x a1: Then 
repeat the process in Example 1(ii), replacing x? by x ¡» to obtain he confi- 


dence interval. 


É DSE (n-1)8 | o g= = NX, =P. (5) 
i=l 


2 > 72 
Xn—1:0/2  Xn=-1;1—a/2 





REMARK 4 Observe that the confidence interval in (4) differs from that in 
(2) in that o in (2) is replaced by an estimate S, and then the constant 2,/2 
in (2) is adjusted to tn_1,9/2. Likewise, the confidence intervals in (3) and (5) 
are of the same form, with the only difference that (the unknown) y in (3) is 
replaced by its estimate X in (5). The constants n, Xaa jo, nd x 1-a/2 are also 
adjusted as indicated in (5). 

Numerical Example Letn = 25and1—a = 0.95. Then tn- La/2 = tso. 025 = 
2.0639, and the interval in (4) becomes X+0.412785. Also, X%_1.0 R= = Xó40.095 = 
39.364, X711- -a/2 = = X34097 = 12.401, so that the interval in (5) is es 


2457] ~ [0.6108?, 1.93587]. 

Actually, a somewhat more important problem from a practical viewpoint 
is that of constructing confidence intervals for the difference of the means 
of two normal populations and the ratio of their variances. This is a way of 
comparing two normal populations. The precise formulation of the problem 


is given below. 





Let X1,..., Xm and My ..., Y, be two independent random samples om thie 
N(u, oi 2) and N(u2, 03) distributions, respectively, with all 41, 42, of ; and o 
unknown. We wish to construct confidence intervals for jz; — uz and of /0%. 
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DISCUSSION 
(i) Confidence interval for 4; — u2. In order to be able to resolve this problem, 


(ii) 


we have to assume that the variances, although unknown, are equal; i.e., 
of = o} = 0°, Say. 
Let us review briefly some distribution results. Recall that X — uy 


N(0, a = u2 NO, = ón and by independence, 


5. 1 1 
X- E) = (m eo + p ~ NO 1). (6) 
n 














Also, if 
m z 1 n e 
=- - XP, $= a - PY, 
i=l j=l 
then ea ~ Xai, ea ~ X21, and by independence, 
m— 1)S} + (n- DS? : 
Sica a ARA (7) 
O 
From (6) and (7), we obtain then: 
X-—Y)-(n1= 
: DS? ~ E a A i 
m— n— y 
min? z (a + a) 


Then working with (8) as in Example 1(i), we arrive at the following con- 
fidence interval 


= 2 _ 2 
a = Y) NN tion] DSz se G AL (= + 3) 








m+n-2 n 








m+n-2 n 


(X-Y)+ tos] 


_ 1\@2 anal 
srj ae laa a ( = 4 3} (9) 


m+n-2 n 


(m—1)82+(™-1s2 (1 1 
a 











Confidence interval for o? /o2. By the fact that a ary? oo 


2% 
x2_,, and independence, we have stick l E x st Pe Fy tog From the 
F-tables, determine any pair (a, b) with 0 <a < b such that Pas <X<b)= 


1 — q, where X ~ Fw-1,m-1. Then, for all 0 = (m1, 12, of , 03 =), 


2 2 2 2 2 
S S S 

plas Ex <b) = 1-4, or Pla <b <> +) = 
oy Sy SP 03 Sy 


Rh 
Q 





In particular, for all 0, 


se o? S2 
(Sr n—1,m—1:1-0/2 £ a3 < ge eo, min) =1-a, 
2 y 


292 Chapter 10 Confidence Intervals and Confidence Regions 


where P(X < Fn-1m-11-0/2) = P(X > Fo-1m-10/2) = 5 (and X ~ 
Fn-1,m-1). The required confidence interval is then 


se se 
ES n—1,m—1;1—0/2> sE Fy-1 mtn . (10) 


Numerical Example Let m = 13,n = 14, and 1 — æ = 0.95. Then 
bntn—2:0/2 = t25,0.025 = 2.0595, so that the interval in (9) becomes 


— 128241382 /1 1 or 
aros | iS Y € + a) ~ (X-P)+0.1586,/1257 + 135%. 














Next, ai m—L0/2= Fiz, 12:0.025 = 3.2388, Frai, m—1;1—a/2 = Fi3,12,0.975 = me = 
y — = 0.3171. Therefore the interval in (10) is [0. 317154, 3. 2388 


2.1 If the independent rv.’s X1, ..., Xn are N(u, 0?) distributed with both u 
and o? unknown, construct a 100(1 — «)% confidence interval for o. 


2.2 Refer to Exercise 1.18 and suppose that all u1, u2, and a i a are unknown. 
Then construct a 100(1 — «)% confidence interval for 07/03. 





I 10.3 A Confidence Region for (u, o”) in the N( u, o?) Distribution 


Refer again to Example 1 and suppose that both u and o? are unknown, as is 
most often the case. In this section, we wish to construct a confidence region 
for the pair (u, 0”); i.e., a subset of the plane determined in terms of statistics 
and containing (u, 0?) with probability 1 — a. This problem is resolved in the 
following example. 


| EXAMPLES | On the basis of the random sample X4, . . . , X, from the N(u, 0?) distribution, 
_——- construct a confidence region for the pair (u, o°) with confidence coefficient 
l-a. 


DISCUSSION In solving this problem, we draw heavily on what we have 
done in the previous example. Let X be the sample mean and define S? by 
S?= + yi, (% — XP. Then 


P 2 
Dano, gs, an 

O o? 
and the two r.v.'s involved here are independent. From the Normal tables, 
define c > 0 so that P(—c < Z<c)=VJl-a, Z ~ N, 1); cis uniquely 
determined. From the x ? tables, determine a pair (a, b) with 0 < a < b and 
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P(a < X < b) = yl1- a, where X ~ Xa Then, by means of (11), and with 
= (u, 0”), we have: 

















P| -e< a P se| = Vina 
i (12) 
Pal a < A = < J =yl-a. 
O 
These relations are rewritten thus: 
o 2,2 
r| pa PE <e] s [2 < Z] =vVI=0, (13) 
2 2 
a A Fa 
(14) 
so that, by means of (12)-(14) and independence, we have: 
? = 2 
z| oe PEO oa EA 
oO O 
= P| ne nX — u) <0|Po|a < m- 1)8* pes <b] 
O a? 
2 2 2 2 
z Pal wu Xy < Toe E sa amt | =1-0e. (16) 
n b a 


Let # and s? be the observed values of x and S?. Then in a system of orthogonal 


(u, o7)-axis, the equation (u — #)? = Cor is the equation of a parabola with 
vertex V located at the point (%, o, with. focus F with coordinates £, $ ay and 
with directrix L with equation o? = 5, (See Figure 10.1). Then the part of 


the plane for which (u - ©? < a is the inner part of the parabola along with 
the points on the parabola. Since 
=1 
¿Es (n _ 


n 


2 
= 5 x and a == xy 


Oey 








are straight lines parallel to the x-axis, the set of points (u, o?) in the plane, 
which satisfy simultaneously all inequalities: 
1< 1L£ 
2 2 a? 
2 (2; 2 <o = 2 (x; — £) 


i=1 i=1 


De 
= Co 
(u-#)P < n 





is the part of the plane between the straight lines mentioned above and the 
inner part of the parabola (along with the points on the parabola) (see shaded 
area in Figure 10.1). 

From relation (15), it follows then that, when replacing £ by X and s? by S?, 
the shaded region with random boundary (determined completely as described 
above) becomes the required confidence region for (u, 07). What is depicted 
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Figure 10.1 





Confidence Region for 
(u, o?) with Confidence 
Coefficient 1 — a 








confidence region for 
Cu, 02) with confidence 





coefficient 1-— a 




















in Figure 10.1 is a realization of such a confidence region, evaluated for the 
observed values of the X;'s. 

Actually, the point c above is 2, , where y = (1—4 1 — a)/2, and for definite- 
ness, we may choose to split the probability 1 — y1 — a equally among the two 
tails of the Chi-Square distribution. Thus, we take b = x7_,,, anda = x74, 
Then the confidence region is: 








E z? i 2 a i. E 
u- < Lo”, II AY < 0? < DY 0% - XY, 
Xn-ly i=1 Xn—ll-y i=1 
y =(1-v1=0)/2. (16) 


Numerical Example As a numerical example, take n = 25 and a = 0.05, 
so that y = 0.012661, and (by linear interpolation) 2, = 2.236, Koss a 
42.338, Xo4;1—y ~ 11.180, and the confidence region becomes: 
25 25 
(u—X)” < 0.1999880”, 0.023619 X (XX < o” < 0.089847 Y (X¡-X 
i=l i=l 
or, approximately, 


25 25 
(u -XP <020%, 0.024) (X: - XP < o° < 0.09) (X, - XY. 
i=l i=l 


REMARK5 Asomewhat general theory for constructing confidence regions 
is discussed in Chapter 12 (see Theorem 4 there and the examples following 
it). 


l 10.4 Confidence Intervals with Approximate Confidence Coefficient 


It is somewhat conspicuous that in this chapter we have not yet dealt with 
examples, such as the Binomial, the Poisson, and the Negative Exponential. 
There is a reason, however, behind it, and that is that the expressions which 
would serve as the basis for constructing confidence intervals do not have 
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a known exact distribution. They do have, however, an approximate Normal 
distribution, and this fact leads to the construction of confidence intervals with 
approximate (rather than exact) confidence coefficient 1 — a. The remainder 
of this section is devoted to constructing such intervals. 


On the basis of the random sample X;,..., Xn from the B(1, 0) distribution, 
constructa confidence interval for 9 with confidence coefficient approximately 
l-a. 


DISCUSSION The tools employed here, as well as in the following two 
examples, are the CLT and the WLLN in conjunction with either Theorem (ii) 
or Theorem 6(iii) in Chapter 7. It will be assumed throughout that n is large 
enough, so that these theorems apply. 

Recall that Ey X, = 0 and 0;(X1) = 0(1 — 0), so that, by the CLT, 


Xn—0 

VEG) 5 90906, 45 (17) 
od — 6) 

In the denominator in (17), replace 0(1 — 0) by S did 52 Lx 


A — Xn)? = FOC 1X — nX*) = ODO E re y) =X- E 
X(1 — X), in order to obtain (by Theorem 7(ii) in Chapter 7), 


En- 
ma 


~ N(0, 1). (18) 


It follows from (18) that 


VUXn-— 9) 
V Xx, = Xn) 


This expression is equivalent to: 


_ | 1- | l= 
P| x, Zan? La Rn) - ae Xn) a, for all 0, 


which leads to the confidence interval 


_ [X,1—Xn) = X a=% a X, =X 
Xn — 2a/2 £- 2), Xn + Za/2 mAs) n) = Xn + 24/2 Zai n) 


(19) 





r| Zaj2a S < 7 ~l-a, forallð. 














with confidence coefficient approximately 1 — a. 
Numerical Example Forn = 100 and 1 — a = 0.95, the confidence inter- 


val in (19) becomes: X,, + 1.96,/ 2:14 = X,, + 0.196/X,(1 — Ž,). 
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Construct a confidence interval for 9 with confidence coefficient approxi- 
mately 1 — a on the basis of the random sample Xj, ..., Xn from the P(@) 
distribution. 


DISCUSSION Here £,X, = Ge (X1) = 0, so that, working as in the previous 
example, and employing Theorem 6(iii) in Chapter 7, we have 
JW Xy, — 9) MX» — 0) 
vo VXn 


Hence Po[—Zay2 < oe < Za/2] = 1—a, for all 0, which leads to the required 


confidence interval id 


_ (Xr = I Xn E IX 
Xn — teja —» Xn t+ aza) — | = Xn 2/27) —. (20) 
n n n 


Numerical Example Forn = 100 and 1—« = 0.95, the confidence interval 
in (20) becomes: X, + 0.196/X,,. 


=N(0,1), or x N(0, 1). 





Let X;,..., Xn be a random sample from the Negative Exponential distribu- 


tion in the following parameterization: f(x; 0) = Le" 9 x > 0. Construct a 


confidence interval for 9 with confidence coefficient approximately 1 — a. 


DISCUSSION In the adopted parameterization above, EọXı = 0 and 
o (X1) = 0?. Then working as in the previous example, we have that 


MX A 0) MX = 9) 
6 An 


It follows that the required confidence interval is given by: 


=N(0,D, or ~ N(0, 1). 


= An + Xn 5 Xn 
Xn — Za =, X a2 >= | = Xn E 2/2 21 
[x aE Ta n+ ae n ln (21) 
Numerical Example Forn = 100 and 1—« = 0.95, the confidence interval 

in (21) becomes: X,, + 0.196X,,. 





4.1 Let the independent r.v.’s X1, ..., Xn have unknown (finite) mean u and 
known (finite) variance 0 ?, and suppose that n is large. Then: 
(i) Use the CLT in order to construct a confidence interval for y with 
approximate confidence coefficient 1 — a. 
(ii) Provide the form of the interval in part (i) for n = 100, o = 1, and 
a = 0.05. 
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(iii) Refer to part (i) and suppose that o = 1 anda = 0.05. Then determine 
the sample size n, so that the length of the confidence interval is 0.1. 

(iv) Observe that the length of the confidence interval in part (i) tends to 
0 as n => œ, for any o and any a. 


4.2 Refer to Exercise 4.1, and suppose that both u and o? are unknown. Then: 
(i) Construct a confidence interval for u with approximate confidence 
coefficient 1 — a. 
(ii) Provide the form of the interval in part (i) for n= 100 and a = 0.05. 
(iii) Show that the length of the interval in part (i) tends to 0 in probability 
as n > 00. 


Hint: For part (i), refer to Theorem 7 (ii) in Chapter 7, and for part (iii), 
refer to Theorem 7(i) and Theorem 6(ii) in the same chapter. 


4.3 (i) Let X ~ N(pu, o”), and for 0 < a < 1, let xy and %_, be the ath and 

(1 — a)th quantiles, respectively, of X; i.e., P(X < my) = P(X > 

Xiu) = a, so that P(X, < X < X_a) = 2a. Show that xy = u + 

oLa), ti-a = u+0 TL1 — æ), so that [%,, ti-a] = [u +0 $ (a), 
u+o071(1 — æ)]. 

(ii) Refer to Exercise 4.5(i) of Chapter 9 (see also Exercise 4.4 there), 

where itis found that the posterior p.d.f. of 0, given X1 = %,..., Xn = 





Ün, 6 |2, Aa. Ln); is NY, 1) 
Use part (i) in order to find the expression of the interval [%, 41-42] 
here. 


Remark: In the present context, the interval [%,, 41] is called a pre- 
diction interval for 0 with confidence coefficient 1 — 2a. 


(iii) Compute the prediction interval in part (ii) when n = 9, u = 1, X= 
1.5, and œ = 0.025. 


4.4 Let X;,..., Xn be independent r.v.'s with strictly increasing d.f. F, and let 
Y, be the ith order statistic of the X;'s, 1 < i < n. For 0 < p < 1, let xp be 
the (unique) pth quantile of F. Then: 

(i) Show that for any ¿and j with 1 <i< j<n-1, 


k=1 k 


j-1 
PY; < < i= >, (jua (a =1- p). 
Thus, [Y;, Y;] is a confidence interval for x, with confidence coefficient 
T (7) p*q"=*. This probability is often referred to as probability of 

coverage of Xp. 

Gi) Forn = 10and p = 0.25, identify the respective coverage probabilities 
for the pairs (Yi, Y3), (Y1, Ya), (Ya, Ya), (L, Y5). 

(iii) For p = 0.50, do the same as in part (ii) for the pairs (Ya, Ya), (Ya, Y7), 
(Ya, Ys), (Y5, Y7). 
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(iv) For p = 0.75, do the same as in part (ii) for the pairs (Ya, Y19), (Y7, Yio), 
(Y7, Yo), (Ye, Yo). 


Hint: For part (i), observe that: P(Y; < Xp) = P(at least i of Xj,..., 

Xn < Xp) = Drs (p) 04” +, since P(X; < Lp) = p and q = 1 — p. Also, 

P(Y; < Lp) = P(Y; < Up, Yj > £p) + P(Y; < Lp, Y; < Lp) = P(Y; < Xp < 

Y;) + P(Y; < xp), so that PŒ; < £p < Yj) = PË; < £p) — P(X; < 2»). 
For part (iv), observe that (;)p*g""* = (, )g"p"" = ()q"p"”" (by setting 
n— k = r and recalling that (,,” ) = (7) 


4.5 Let X be a r.v. with a strictly increasing d.f. F, and let p be a number with 
0 < p < 1. Consider the event: Ap = {F(X) < p} = {s € S; F(X(s)) < 
p} = {s € S; X(s) < F71(p)}. So, Ap is the event in the underlying sample 
space S for the sample points s of which F(X(s)) < p. Since for each 
fixed x, F(x) represents the proportion of the (unit) distribution mass of 
F which is covered (or carried) by the interval (—oo, x], it follows that the 
random interval (—oo, X] covers (carries) the (random) proportion F(X) 
of the distribution mass of F, and on the event Ap, the random interval 
(oo, X] covers (carries) at most 100p% of the mass of F. Equivalently, 
the random interval (X, oo) covers (carries) at least 100(1 — p) of the 
distribution mass of F. 























Use Theorem 10 in Chapter 6 in order to show that P(A,) = p; i.e., 
(oo, X] covers at most 100p% of the distribution mass of F with probabil- 
ity p. Equivalently, the random interval (X, co) covers at least 100(1 — p)% 
of the distribution mass of F with probability p. 
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¿Y 
Testing Hypotheses 


In this chapter, the problem of testing hypotheses is considered to some ex- 
tent. Additional topics are discussed in Chapter 12. The chapter consists of 
four sections, the first of which is devoted to some general concepts and the 
formulation of a null hypothesis and its alternative. A number of examples 
discussed provide sufficient motivation for what is done in this section. 

Section 2 is somewhat long and enters into the essence of the testing 
hypotheses issue. Specifically, the Neyman—Pearson Fundamental Lemma is 
stated, and the main points of its proof are presented for the case that the un- 
derlying r.v.’s are of the continuous type. It is stated that this result by itself is 
of limited use; nevertheless, it does serve as the stepping stone in establishing 
other more complicated and truly useful results. This is obtained when the 
underlying family of distributions is the so-called family of distributions of the 
exponential type. Thus, the definition of an exponential type p.d.f. follows, and 
it is next illustrated by means of examples that such families occur fairly often. 
In an exponential type p.d.f. (in the real-valued parameter 0), uniformly most 
powerful (UMP) tests are presented for one-sided and two-sided hypotheses, 
which arise in practice in a natural way. This is done in Theorems 2 and 3. 

In the following section, Theorems 2 and 3 are applied to concrete cases, 
such as the Binomial distribution, the Poisson distribution, and the Normal 
distribution. All applications are accompanied by numerical examples. 

The last section of this chapter, Section 4, is also rather extensive and 
deals with Likelihood Ratio (LR) tests. General concepts, the necessary nota- 
tion, and some motivation for the tests used are given. The better part of the 
section is devoted to deriving LR tests in Normal distributions. The problem 
is divided into two parts. The first part considers the case where we are deal- 
ing with one sample from an underlying Normal distribution, and LR tests are 
derived for the mean and the variance of the distribution. In the second part, 
two independent random samples are available coming from two underlying 
Normal populations. Then LR tests are derived in comparing the means and the 
variances of the distributions. In all cases, the results produced are illustrated 
by means of numerical examples. 
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i 11.1 General Concepts, Formulation of Some Testing Hypotheses 


In order to motivate the formulation of a null hypothesis and its alternative, 
consider some specific examples. Most of them are taken from Chapter 1. 


In reference to Example 6 in Chapter 1, let 6 be the unknown proportion of 
unemployed workers, and let 6) be an acceptable level of unemployment, e.g., 
6) = 6.25%. Then the parameter space is split into the sets (0.0625, 1) and 
(0, 0.0625], and one of them will be associated with the (null) hypothesis. 
It is proposed that that set be (0.0625, 1); i.e., Ho: 6 > 0.0625 (and there- 
fore Ha: 6 < 0.0625). The rule of thumb for selecting Ho is this: “Select as 
null hypothesis that hypothesis whose false rejection has the most serious 
consequences.” Indeed, if 0 is, actually, greater than 6.25% and is (falsely) 
rejected, then human suffering may occur, due to the fact that the authorities 
in charge had no incentives to take the necessary measures. On the other hand, 
if0 < 0.0625 was selected as the null hypothesis and was falsely rejected, then 
the most likely consequence would be for the authorities to undertake some 
unnecessary measures and, perhaps, waste some money. However, the former 
consequence is definitely more serious than the latter. Another way of look- 
ing at the problem of determining the null hypothesis is to formulate as such 
a position, which we wish to challenge, and which we are willing to accept 
only in the face of convincing evidence, provided by the interested party. To 
summarize then, if X is the r-v. denoting the number of unemployed workers 
among n sampled, then X ~ B(n, 0) and the hypothesis to be tested is Ah: 
0 > 0.0625 against the alternative Ha: 6 < 0.0625 at (some given) level of 
significance a. 


In reference to Example 8 in Chapter 1, if X is the r.v. denoting those young 
adults, among the n sampled, who listen to this particular weekend music 
program, then X ~ B(n, 0). Then, arguing as in the previous example, we have 
that the hypothesis to be tested is Hp: 0 < 6) (=100p%) against the alternative 
Hy: 0 > 6 at level of significance a. 


Refer to Example 12 of Chapter 1, and let X be the r.v. denoting the mean 
bacteria count per unit volume of water at a lake beach. Then X ~ P(@) and 
the hypothesis to be tested is Hp: 0 > 200 against Ha: 6 < 200 at level of 
significance g. 


Suppose that the mean 6 of ar.v. X represents the dosage of a drug which is 
used for the treatment of a certain disease. For this medication to be both safe 
and effective, 9 must satisfy the requirements 0; < 0 < 62, for two specified 
values 6; and 62. Then, on the basis of previous discussions, the hypothesis 
to be tested here is Hp: 0 < 6; or 0 > 02 against the alternative Hy: 0, < 
0 < 03 at the level of significance a. Of course, we have to assume a certain 
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distribution for the r.v. X, which for good reasons is taken to be N(0, 0”), o 
known. 


Refer to Example 16 in Chapter 1, and suppose that the survival time for a ter- 
minal cancer patient treated with the standard treatmentisar.v. X ~ N(0,, of). 
Likewise, let the r.v. Y stand for the survival time for such a patient subject 
to the new treatment, and let Y ~ N(62, 03). Then the hypothesis to be tested 
here is Ho: 62 = 61 against the alternative Ha: 02 > 0, at level of significance a. 


REMARK 1 The hypothesis to be tested could also be 62 < 6,, but the possi- 
bility that 02 < 0; may be excluded; it can be assumed that the new treatment 
cannot be inferior to the existing one. The supposition that 62 = 0, there is 
no difference between the two treatments, leads to the term “null” for the 
hypothesis Hp: 02 = 01. 

Examples 1-4 have the following common characteristics. A rv. X is 
distributed according to the p.d.f. f(-; 6), 0 € Q C R, and we are interested in 
testing one of the following hypotheses, each one at some specified level of sig- 
nificance a: Hp: 0 > 6) against Ha: 0 < 09; Ho: 0 < 0 against Ha: 0 > 60; Ho:0 < 
6, or 6 > 62 against Ha: 61 < 6 < 6. It is understood that in all cases 6 remains 
in Q. In Example 5, two Normally distributed populations are compared in 
terms of their means, and the hypothesis tested is Hp: 62 = 0, against Hy: 
62 > 61. An example of a different nature would lead to testing the hypothesis 
Ho: 62 < 0, against Ha: 62 = 64. 

In the first four examples, the hypotheses stated are to be tested by means 
of a random sample Xj, ..., Xn from the underlying distribution. In the case of 
Example 5, the hypothesis is to be tested by utilizing two independent random 
samples X¡,..., Xm and Y,,..., Yn from the underlying distributions. 

Observe that in all cases the hypotheses tested are composite, and so are the 
alternatives. We wish, of course, for the proposed tests to be optimal in some 
satisfactory sense. If the tests were to be UMP (uniformly most powerful), then 
they would certainly be highly desirable. In the following section, a somewhat 
general theory will be provided, which, when applied to the examples under 
consideration, will produce UMP tests. 





1.1 In the following examples, indicate which statements constitute a simple 
and which a composite hypothesis: 

(i) X is ar.v. whose p.d.f. f is given by f(x) =2e*, x > 0. 

(ii) When tossing a coin, let X be the r.v. taking the value 1 if the head ap- 
pears and 0 if the tail appears. Then the statement is: The coin is 
biased. 

(iii) X is ar.v. whose expectation is equal to 5. 


1.2 Let Xj, ..., Xn be iid. rv.’s with p.d.f. f which may be either Normal, 
N(n, 07), to be denoted by fy, or Cauchy with parameters u and o7, to be 
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denoted by fc, where, we recall that: 


1 
f(a; u, 02) = meee, xen, WER, 0 >0, 
O 


; a 0 1 
So; u, o )= x x (a -uP +o? 





xen, pen, o >0. 


Consider the following null hypotheses and the corresponding alternatives: 
(i) Ho: f is Normal, Haj: f is Cauchy. 

Gi) Hoz: f is Normal with u < uo, Haz: f is Cauchy with u < uo. 

Gii) Hos: f is Normal with u = uo, Has: f is Cauchy with u = uo. 

(iv) Ha: f is Normal with u = no, o > 00, Has: f is Cauchy with u = uo, 








O > 00. 

(v) Hos: f is Normal with u = o, o < oo, Has: f is Cauchy with u = uo, 
O =00. 

(vi) Hos: f is Normal with u = uo, o = 00, Has: f is Cauchy with u = uo, 
O = 00. 


State which of the Ho; and which of the Hx; i = 1, ..., 6, are simple 
and which are composite. 
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| Uniformly Most Powerful Tests for Some Composite Hypotheses 


THEOREM 1 


In reference to Example 1, one could certainly consider testing the simple 
hypothesis Ho: 0 = 4 (e.g., 0.05) against the simple alternative Ha: O = 64, 
for some fixed 6; either >69 or <0). However, such a testing framework would 
be highly unrealistic. It is simply not reasonable to isolate two single values 
from the continuum of values (0, 1) and test one against the other. What is 
meaningful is the way we actually formulated Hp in this example. Nevertheless, 
it is still true that a long journey begins with the first step, and this applies here 
as well. Accordingly, we are going to start out with the problem of testing a 
simple hypothesis against a simple alternative, which is what the celebrated 
Neyman—Pearson Fundamental Lemma is all about. 





(Neyman—Pearson Fundamental Lemma) Let Xj, ..., Xn be a ran- 
dom sample with p.d.f. f unknown. We are interested in testing the 
simple hypothesis Ho: f = fo (specified) against the simple alternative 
Ha: f = fi (specified) at level of significance a (0 < a < 1). To this end, 
define the test y as follows: 


1 if fA)... fin) > CH)... fon) 
Pli- m= fy if fia)... fin) = Ch)... fon) 0) 
O if fia)... fin) < Cfo)... fon), 
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where the constants C and y (C > 0, 0 < y < 1) are defined through the 
relationship: 


Es p(X1, ..., Xn) = PLACA) ++ fi(Xn) > CHK- >: PX] 
+YPG LIX) >: fAn) = Cfo(X1)- ++ fo(Xn)] = a. 
(2) 


Then the test y is MP among all tests with level of significance < a. 


REMARK 2 The test y is a randomized test, if 0 < y < 1. The necessity 
for a randomized test stems from relation (2), where the left-hand side has 
to be equal to a. If the X;’s are discrete, the presence of y (0 < y < 1) 
is indispensable. In case, however, the X;’s are of the continuous type, then 
y = 0 and the test is nonrandomized. 

The appearance of fo as a subscript indicates, of course, that expectations 
and probabilities are calculated by using the p.d.f. f for the X;'s. 


PROOF OF THEOREM 1 (Outline for X;’s of the Continuous Type) To simplify 
the notation, write 0 (or 1) rather than fo (or fi) when fo (or fi) occurs 
as a subscript. Also, it would be convenient to use the vector notation X = 
(X1,..., Xn) and x = (%,..., Xn). First, we show that the test y is of level 
a. Indeed, let T = {x e R”; Lo(x) > 0}, where Lox) = fol)... fox), and 
likewise Li(x) = fila)... fi(%,). Then, if D = XT), ie, D = {s € S; 
X(s) € T}, so that D° = {s e S; X(s) e T°}, it follows that Po(D)) = 
P(X eT) = f re Ly(x) dx = 0. Therefore, in calculating probabilities by using 
the p.d.f. Lo, it suffices to restrict ourselves to the set D. Then, by means of (2), 


Ep(X) = Po[L(X) > CLo(X)] 
= PAIL (X) > CLo(X)] 1 D} 





STA 
= all oon > c| N D) (since Lo(X ) > 0 on D) 
= PAY > C)=1- P(Y < C) = g(C), say, 


where Y = Hp on D, and arbitrary on D°. The picture of 1 — Po(Y < C) is 


depicted in Figure 11.1, and it follows that, for each a (0 < a < 1), there is 
(essentially) a unique C such that 1 — P(Y < C) = a. That is, Esp(X) = a, 
which shows that the test y is of level a. 

Next, it is shown that y is MP as described by showing that, if y* is any 
other test with Eog* (X ) = a* < a, thenz,(1) = Eip(X) > Erp*(X) = mg) 
(i.e., the power of y is not smaller than the power of any such test y* of level 
of significance < a). Indeed, define B* and B- by: 


B+ = {x EN” g(x) — ¢* (x) > 0) = (p > y, 
By = {x € R”; po) — g* Œ) < 0) = (p < y”). 
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Figure 11.1 
The Graph of the 


Function of (C) = 
1— Poly < C) 

















Then, clearly, Bt N B- = Ø, and, by means of (1), 
B*=(9>p)<(p=1D< (Lı > CLo), 


_ , (3) 
B=(@<@)C@=90)C (Lı < CLo). 
Therefore 
[ O- OLE - Crax 
z / O- PONLO) — CLAGO)]Ax 
+ / 1009) — OLA — CLo(w)]ax > 0 by (3). 
Hence 


Í PLAX — / pL 
sin R” 


=C(«—-a*)>0 (sincea* <a). 
Hence fyn POLA = EPX) = PX) = fom OL dx. A 


This theorem has the following corollary, according to which the power 
of the MP test y cannot be < a; not very much to be sure, but yet somewhat 
reassuring. 


COROLLARY For the MP test g, 7,(1) = E\g(x) > a. 


PROOF Just compare the power of y with that of g* = a whose level of 
significance and power are both equal to «œ. A 


REMARK3 The theorem was formulated in terms of any two p.d.f.'s fp and 
Jı as the two possible options for f. In a parametric setting, where f is of the 
form f(-; 0), 0 € Q C R”, r > 1,the p.d-f.’s fo and fı will correspond to two 
specified values of 0; 6, and 0,, say. That is, fo = f(-; 0o) and fi = fC; 01). 
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The following examples will help illustrate how the Neyman—Pearson 
Fundemental Lemma actually applies in concrete cases. 


On the basis of a random sample of size 1 from the p.d.f. f(a; 0) =0x%*, 0 < 
x<1(0> 1) 


(i) Use the Neyman—Pearson Fundamental Lemma to derive the MP test for 
testing the hypothesis Hp: 6 = 9 against the alternative Ha: 0 = 6; at 
level of significance a. 

(ii) Derive the formula for the power x (6). 
(iii) Give numerical values for parts (i) and (ii) when 0) = 4and 6; = 6, 6; = 2; 
take a = 0.05. 


DISCUSSION _ Inthe first place, the given function is, indeed, a p.d.f., since 
Jo 0x9 dx = ar? |} = 1. Next: 
(i) Ab is rejected, if for some positive constant C*: 
Ojal 
O0xó—1 


Now, if 01 > 9, this last inequality is equivalent to: 


ba C* 1/(01—00) 0 C* 1/(01—00) 
loga > 1og( 2 ) = logC c= (4 ) , 


9yC* 9yC* 
O” or (6, — %) logx > lo - ) 
1 1 








Os 








01 01 


or x > C. If 6, < 9, the final form of the inequality becomes x < C. For 
61 > 4, the cutoff point is calculated by: 





1 
P(X > C) = / 60° da = | = 1-0” =a, or C=(1-09) 4%, 
C 
For 6; < 60, we have: 
Pa(X <C) =" =0”" =a, or C=a'/, 


Then, for 0, > 0p, reject Ho when x > (1 — a)'/"; and, for 0, < Oo, reject 
Ho when x < a!/%, 
(ii) For 6; > 6o, the power of the test is given by: 


1 
(01) = Py (X > C)= f Qx’ ldg = x"|b =1-C%, or 
C 
z (81) = 1 — (1 — a)%/%, For 9, < 60, we have: 
C 
(01) = Pa (X < C) = / ox” Wea | = 0" Se, 
0 


That is, 
(01) =1—(1—a)*/ for 0; > Op; (01) = a%/% for 0, < 6o. 
(iii) For 0, = 6, the cutoff point is: 
(1 — 0.05)!/4 = 0.95%% ~ 0.987, 
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and the power is: (6) = 1 — (0.95)!5 ~ 1 — 0.926 = 0.074. For 6, = 2, 
the cutoff point is: (0.05)/* = (0.05)°> ~ 0.473, and the power is: 1(2) = 
(0.05)1/2 ~ 0.224. 


L EXAMPLE? | Refer to Example 6 and: 


(i) Show that the Neyman-Pearson test which rejects the (simple) hypothesis 
Hp: 0 = 0, when tested against the (simple) alternative Hy 4: 0 = 01, for 
some fixed 0, > 60, at level of significance g, is, actually, UMP for testing 
Ah against the composite alternative Ha: 0 > 09 at level of significance a. 

(ii) Also, show that the Neyman-—Pearson test which rejects the hypothesis 
Ho: 6 = 6 when tested against the (simple) alternative Hj, ,,: 6 = 01, for 
some fixed 0, < 6p, at level of significance a, is, actually, UMP for testing 
HA) against the composite alternative H4: 0 < 6 at level of significance a. 

(iii) Show that there is no UMP test for testing the hypothesis Hp: 6 = 6 
against the (double-sided) composite alternative H}: 0 4 0o at level of 
significance a. 


DISCUSSION 


(i) Indeed, by part (i) of Example 6, the MP test for testing Ho: 6 = 6 against 
Hao: 0 = 01 rejects Hy) when x > (1 — a)/%, regardless of the specific 
value of 6, provided 6; > 6. Thus, this test becomes a UMP test when 
Ha o, is replaced by Ha: 0 > 0. 

(ii) Likewise, by Example 6(i), the MP test for testing Ho: 6 = 6 against 
H, a: 9 = 01 rejects Hy when x < a*/%, regardless of the specific value 
61, provided 6; < 6. Thus, this test becomes a UMP test when Hao, is 
replaced by H}: 0 < 6p. 

(iii) The rejection region for testing the hypotheses Ap: 6 = 6 against the 
alternative H4: 0 > 0 is Ri = ((1 — «)!/, 1), and the rejection region for 
testing Ho against H}: 0 < o is R2 = (0, a1/%). Since these MP regions 
depend on which side of 6 lie the alternative 6’s and are different, there 
cannot exist a UMP test for testing Hp against Hy: 0 4 6. 


I EXAMPLE 8 On the basis of a random sample of size 1 from the p.d.f. f(x; 6) = 1 + 6” 5 — 


x)0<x<l, -1<0<l: 


(i) Use the Neyman—Pearson Fundamental Lemma to derive the MP test for 
testing the hypothesis Hp: 9 = 0 (i.e., the p.d.f. is U(O, 1)) against the 
alternative Ha: 0 = 0; at level of significance a. 

(ii) Investigate whether or not the test derived in part (i) is a UMP test for 
testing Ho: 0 = 0 against the alternative H}: 0 40. 

(iii) Determine the test in part (i) for a = 0.05. 

(iv) Determine the power of the test in part (i). 


DISCUSSION First, the function given is a p.d.f., because it is nonnegative 


and ia + 0% -Ddr = 1+ 6°(5 — 3) = 1. Next: 
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(i) Hp is rejected whenever 1 + oG — x) > C*, or x < C, where C = 
- — (C* — 1)/0?, and C is determined by P(X < C) = g, so that C = a, 
since X ~ U (0, 1) under Ho. Thus, Ho is rejected when x < a. 

(ii) Observe that the test is independent of 6;, and since it is MP against each 
fixed 01, it follows that it is UMP for testing A, against H}: 0 40. 

(iii) For o; = 0.05, the test in part (i) rejects Ay whenever x < 0.05. 
(iv) For 6 Æ 0, the power of the test is: 


m(0) = P(X <a)= [pre — ) [æ= 01 +o 





Thus, e.g., 7(£1) = ja(1—0)+a, r(+35) = ja(l — æ) + a, which for 
a = 0.05 become: (+1) = 0.074, z (+4) = 0.056. 








L. 11.2.1 Exponential Type Families of p.d.f.’s 


The remarkable thing here is that, if the p.d.f. f(; 0) is of a certain general 
form to be discussed below, then the apparently simple-minded Theorem 1 
leads to UMP tests; it is the stepping stone for getting to those tests. 


DEFINITION 1 
The p.d.f. fC; 6), 9 € Q CH, is said to be of the exponential type, if 


f(x; 0) = CeL OTD x a(x), en, (4) 


where Q is strictly monotone and h does not involve @ in any way; C(0) 
is simply a normalizing constant. 


Most of the p.d.f.’s we have encountered so far are of the form (4). Here 
are some examples. 


L _EXAMPLE9 | The B(n, 0) p.d.f. is of the exponential type. 


DISCUSSION Indeed, 
F@; 0) = (Ted OI, A= {0,1,..., 29, 


where, we recall that J, is the indicator of A; i.e., I4(v) = 1 if x € A, and 
la¡(x)=0ifx € A. 
Hence 


Fa 8) =(1-9y" - PHI x (o, 


so that f(x; 0) is of the form (4) with C@) = (1 — 07, Q(0) = log(;4) 
strictly increasing (since 5) = > 0 and log(-) is strictly increasing), 


TŒ) = x, and h(a) = (JU. 
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The P(0) p.d.f. is of the exponential type. 


DISCUSSION Here 


e? x 


0 
q A, 4=10,1.... 


Fx; 0)= 





Hence 
1 
f(x; 0)=e 0? x ClO” x aa, 


so that f(x; 0) is of the form (4) with C(@) = e~*, Q(@) = log 9 strictly increas- 
ing, T(x) = x, and h(x) = 4 I4). 


The N(@, o?) (o known) p.d.f. is of the exponential type. 


DISCUSSION Infact, 


1 _ (o? 1 _ e Bg _@ 
S (a; 0) = ———e Y = — e xer” xe, 
y 20 JV 20 


2 
and this is of the form (4) with C(0) = Jee 27 , Q(0) = 5 strictly increasing, 
T(x) = x, and h(a) =e "20". 

The VN(u, 0) (u known) p.d.f. is of the exponential type. 

DISCUSSION Here 


-i(x—uyY? 
e al By 





1 
-6)= 
IG O= TG 


and this is of the form (4) with C(@) = read Q@) = — $ strictly increasing 
(since 44) = z > 0), T@) = @ — Y, and h(x) = 1. 


L. 11.2.2 Uniformly Most Powerful Tests for Some Composite Hypotheses 


THEOREM 2 


We may now proceed with the formulation of the following important results. 





Let Xj, ..., Xn be a random sample with exponential type p.d.f. f(x; 0), 
EER Le, 

f(a; 0) = C()e2O7 x n(x), LER, 
and set V(a,..., %n) = »;_, T(ai). Then each one of the tests defined 


below is UMP of level a for testing the hypothesis specified against the 
respective alternative among all tests of level < a. Specifically: 
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(i) Let Q be strictly increasing. 
Then for testing Ho: 0 < 07 against Ha: 0 > 6, the UMP test is 
given by: 


iL i Wo voc y Gp) > O 
OGAly coop) = KY lt Wig vc op Ho) HE (5) 
ti WE ecepti € O, 
where the constants C and y (C > 0, 0 < y < 1) are determined by: 
POT 0025 Ga) E MAA) = OC) 
SEVP Wig 2009 250) = Cl =e. (6) 
The power of the test is given by: 
NAC coon Mp) OIE WC 00092) = EN WEE). 
(7) 
If the hypothesis to be tested is Hp: 0 > 4, so that the alternative 
is Ha: 0 < 6o, then the UMP test is given by (5) and (6) with reversed 
inequalities; i.e., 
it? WGA NEO 
Pli,- En) =) V if V(n,... n) =C (8) 
Qt’ TO a a C 
where the constants C and y (C > 0, 0 < y < 1) are determined by: 
DL 0005 Xp) = EO oo oy 20) E El 
E A O O 
The power of the test is given by: 
O) = LAWS 2.009 Xp) <I) AAC 000 Xa) SC] @ Sh). 


(10) 


(ii) Let Q be strictly decreasing. 
Then for testing Hp: 0 < 07 against Ha: 6 > 6, the UMP test is 
given by (8) and (9), and the power is given by (10). 
For testing Hp: 0 > 0, against Ha: 8 < 6o, the UMP test is given 
by (5) and (6), and the power is given by (7). 








PROOF (Just Pointing Out the Main Points) |The proof of this theorem is based 
on Theorem 1 and also the specific form assumed for the p.d.f. f(-; 0). As 
a rough illustration, consider the case that Q is strictly increasing and the 
hypothesis to be tested is Hp: 6 < 6. For an arbitrary 6; < 9p, it is shown that 
Ea p(X, ..., Xn) < a. This establishes that Esp(X1, ..., Xn) < aforallO < Op, 
so that y is of level æ. Next, take an arbitrary 6, > 0) and consider the problem 


310 


Chapter 11 Testing Hypotheses 


of testing the simple hypothesis Ho: 0 = 6, against the simple alternative Ha 1: 
0 = 6. Itis shown that the MP test, provided by Theorem 1, actually, coincides 
with the test g given by (5) and (6). This shows that the test y is UMP. The 
same reasoning applies for the remaining cases. A 


Figures 11.2 and 11.3 depict the form of the power of the UMP tests for the 
one-sided hypotheses Ho: 6 < 6) and Ho: 6 > Op. 


Figure 11.2 


Fh: 0 < 00; Ay: 
0 > 0o: The Power 
Curve 

















Figure 11.3 


Fh: 0 > 00, Ha: 
0 < 0o: The Power 
Curve 
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THEOREM 3 
Let Xj, ..., Xn be a random sample with p.d.f. as in Theorem 2, and let 
V(%, ..., £n) be as in the same theorem. Consider the problem of testing 
the hypothesis Hp: 6 < 6, or > 62 against the alternative Hy: 0; < 0 < bə 
at level of significance a. Then the tests defined below are UMP of level 
a among all tests of level <a. Specifically: 


(i) If Q is strictly increasing, the UMP test is given by: 

iL Ch < Wo ncn ey) E Gs 
yı if Vu, ooog GN) = Ci 

Ya if Vx, TES A) = Co 

0 otherwise, 


aD 


p(X1,..., Ln) = 


where the constants C1, Cz and y1, y2 (Ci > 0, C2 > 0, 0 < y <1, 
0 < ya < 1) are determined through the relationships: 


AO 00 Xa) = Palla < VOe A) < Ca) 
+NPso [VX ..., Xn) = Ci) 
+yPo[V(X1, ..., Xn) = Ca] = 0, (12) 
pA Ny 0 cog Mp) = Palin E Wikio 0005 2%) E Ca) 
+ 11Pe,[V(X1, ..., Xn) = Ci) 
Ns.) 26) = C 07 (13) 
The power of the test is given by: 
Tel) = Ill E VO ov oy Xp) < Call se MFI Oty » 05 XS) = Gil 
+ yePo[V(X, ..., Xn) = Ca] (61 <0 < 02). (14) 


Gi) If Q is strictly decreasing, then the UMP test is given by (11) and 
(12)-(13) with reversed inequalities; i.e., 


iH Virsas) = Ch OP Vie ssp Gy) S Cd 
Mm E WGiy oocyte) = Gh 

Ma E Woo.) = Ob 

0 otherwise, 


Plt, -.., En) = 
(15) 
and the constants C1, C2 and yı, y2 are determined by: 
[Bip BON, 00.09 XGa)) = LAW 2005 20) < Ch Or WOGy 200 Xa) > Cal 
+ NP lV X, ..., Xn) = Ci] 
+ y. Po, [V(X1, ..., Xn) = Ca] =a, (16) 
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ANC ooog a) = TAC 0009) E Ci OF WO 5005 20) > Gal 
+ y Pao [V(X1, ..., Xn) = Ci] 
+ y2Po[V(A, ..., Xn) = Ca] =a. (17) 
The power of the test is given by: 
A) = JW (Cay oon 2) E Ci Or WAG oag 2G) > Ca] 
+ yi Po[V(XK, ..., Xn) = Ci] + ya Po V(A, ..., Xn) = Ca] 
(0, <9 <62). (18) 
The power of the UMP test is depicted in Figure 11.4. 











Figure 11.4 


Ho: 0 < 0, or 0 > 
02, Hs: 

0, < 0 < 02: The 
Power Curve 























2.1 If X;,,..., X¡6 are independent r.v.'s: 

(i) Construct the MP test of the hypothesis Ho: the common distribu- 
tion of the X;’s is N(0, 9) against the alternative H4: the common 
distribution of the X;’s is N(1, 9); take œ = 0.05. 

(ii) Also, determine the power of the test. 


2.2 Let X;,..., Xn be independent r.v.'s distributed as N(u, o°), where u is 
unknown and o is known. 
(i) For testing the hypothesis Ao: y = 0 against the alternative Ha: u = 1, 
show that the sample size n can be determined to achieve a given level 
of significance a and given power (1). 
(ii) What is the numerical value of nfor a = 0.05, (1) = 0.9wheno = 1? 
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2.3 (i) Let X,,..., Xn be independent r.v.’s distributed as N(u, 0?), where 
u is unknown and o is known. Derive the MP test for testing the 
hypothesis Ao: y = mı against the alternative Ha: y = pa (ua > M1) 
at level of significance a. 

(ii) Find an expression for computing the power of the test. 
(iii) Carry out the testing hypothesis and compute the power for n= 100, 
a? = 4, uu; = 3, u2 = 3.5, = 3.2, anda = 0.01. 


2.4 Let X;,..., Xn be independent r.v.’s distributed as N(u, 0?) with y un- 
known and o known. Suppose we wish to test the hypothesis Ap: u = o 
against the alternative Ha: u = 1 (11 > uo). 

(i) Derive the MP test for testing Hp against Ha. 

(ii) Fora given level of significance a,(< 0.5) and given power (> 0.5), 
determine the cutoff point Cn and the sample size for which both ay, 
and 7, are attained. 

(iii) Show that a, > 0 and 1, > 1 as n —> oo. 
(iv) Determine the sample size nand the cutoff point Cn for wo = 0, p1 = 1, 
o =1,a, = 0.001, and zn = 0.995. 





2.5 Let X;,..., Xn be independent r.v.'s having the Gamma distribution with 
a known and £ unknown. 
(i) Construct the MP test for testing the hypothesis Hp: $ = 6; against 
the alternative Ha: B = Ba (B2 > 61) at level of significance a. 

(ii) By using the m.g.f. approach, show that, if X ~ Gamma (a, £), then 
X¡+---+X;, ~ Gamma (na, £), where the X;'s are independent and 
distributed as X. 

(iii) Use the CLT to carry out the test when n = 30, a = 10, 6, = 2, Bo = 
3, and a = 0.05. 
(iv) Compute the power of the test, also by using the CLT. 


2.6 Let X be ar.v. with p.d.f. f(x; 0) = pe, x>0, 06€ Q= (0, 00). 

(i) Refer to Definition 1 in order to show that f(-; 0) is of the exponential 
type. 

(ii) Use Theorem 2 in order to derive the UMP test for testing the hy- 
pothesis Hp: 6 > 0, against the alternative Hy: 6 < 6p at level of 
significance a, on the basis of the random sample Xj, ..., Xn from 
the above p.d.f. 

(iii) Use the m.g.f. approach in order to show that the rv. Y = 2 x 
O; 1 X1)/0 is distributed as x2, 

(iv) Use parts (ii) and (iii) in order to find an expression for the cutoff 
point C and the power function of the test. 

Cv) If 4 = 1,000 and a = 0.05, determine the sample size n, so that the 
power of the test at 6, = 500 is at least 0.95. 





2.7 The life of an electronic equipment is a r.v. X whose p.d.f. is f(x; 0) = 
be, x>0, 0 € Q= (0, 00), and let £ be its expected lifetime. On the 
basis of the random sample X], ..., Xn from this distribution: 
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(i) Derive the MP test for testing the hypothesis Ap: £ = 9 against the 
alternative Ha: £ = fl; (€; > £o) at level of significance a, and write 
the expression giving the power of the test. 

(ii) Use the m.g.f. approach in order to show that the rv. Y = 20 x 
Q; X) is distributed as x3,- 

(iii) Use part (ii) in order to relate the cutoff point and the power of the 
test to x”-percentiles. 

(iv) Employ the CLT (assuming that is sufficiently large) in order to 
find (approximate) values for the cutoff point and the power of the 
test. 

(v) Use parts (iii) and (iv) in order to carry out the test and also calculate 
the power when n = 22, £o = 10, £, = 12.5, anda = 0.05. 





2.8 Let X be a rv. whose p.d.f. f is either the U(0, 1), to be denoted by 
Jo, or the Triangular over the interval [0, 1], to be denoted by f (that 
is, fix) = 4x for0 < x < 50) = 4 — 4x for 5 < x < 1, and0 
otherwise). 

(i) Test the hypothesis Ho: f = fo against the alternative Ha: f = f at 
level of significance a = 0.05. 
(ii) Compute the power of the test. 
(iii) Draw the picture of fı and compute the power by means of geometric 
consideration. 


2.9 The number of times that an electric light switch can be turned on 
and off until failure occurs is a r.v. X, which may be assumed to have 
the Geometric p.d.f. with parameter 0; i.e., f(a; 0) = 6 — 0)1,4 = 
1,2,...,0€2=(, 1). 

(i) Refer to Definition 1 in order to show that f(-; 0) is of the exponential 
type. 

(ii) Use Theorem 2 in order to derive the UMP test for testing the hy- 
pothesis Hp: 0 = 4 against the alternative Ha: 0 > 6o at level of sig- 
nificance a, on the basis of a random sample of size n from the p.d.f. 
FC; 0). 

(iii) Use the CLT to find an approximate value for the cutoff point C. 

(iv) Carry out the test if n = 15, the observed sample mean x = 15,150, 
4) = 10-4, and a = 0.05. 


2.10 Let X be ar.v. with p.d.f. f which is either the P(1) (Poisson with 1 = 1), 
to be denoted by fo, or the f(x) = 1/27*!, x = 0,1, .... For testing the 
hypothesis Ho: f = fo against the alternative Ha: f = fı on the basis of 
one observation X: 

(i) Show that the rejection region is defined by: {x > 0 integer; 1.36 x 

2 > C} for some positive number C. 


2 
(ii) Determine the level of significance a of the test when C = 2. 


Hint: Observe that the function g(x) = = is nondecreasing for x 


9x1 
integer > 1. 
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| 11.3 Some Applications of Theorems 2 and 3 


APPLICATION 1 The Binomial Case Inreferenceto Example 9 with n = 
1, we have T(x) = x where x = 0, 1, and then in Theorem 2, V (a, ..., %) = 
Xiti and V(X,,..., Xn) = Xi Xi ~ Bn, 0). Since Q) = log(;4) 
is strictly increasing, consider relations (5) and (6), which, for testing the 
hypothesis Hp: 0 < 6 against the alternative Hy: 0 > 6, become here: 


1 ify ies CO 
P «=P ty eS CE (19) 
0 Y, n <C, 
Enon ..., Xn) = Py(X > 0)+yPa(X=C)=0, X~ Bm 0). (20) 


Relation (20) is rewritten below to allow the usage of the Binomial tables for 
the determination of C and y; namely, 


Po (X < C)—yPa(X =C)=1-a, X ~ Bin, 00). (21) 
The power of the test is: 
Tp(0)= P(X > C)+ y Po(X=C0)=1-— P(X < C0)+ y Po(X =C), 
(0 > 0), X~ Bín, 0). (22) 
Numerical Example Refer to Example 2 and suppose that n = 25, 0) = 
100p% = 0.125, and a = 0.05. 
DISCUSSION Here, for 6 = 4, X ~ B(25, 0.125), and (21) becomes 
Po125(X < C) — y Po.125(X = C) = 0.95. 


From the Binomial tables, the value of C which renders Po ¡2 (X < C) just 
above 0.95 is 6 and Po.125(X < 6) = 0.9703. Also, Po.125(X = 6) = 0.9703 — 
0.9169 = 0.0534, so that y = 2203—0.95 _ 0.0203 ~ 0.38. Thus, the test in (19) 





0.0534 0.0534 — 
is: 

1 ifv>6 

Pli; -<n = 10.388 ifx=6 

0 if xv < 6. 


Reject outright the hypothesis that 100p% = 12.5% if the number of listeners 

among the sample of 25 is 7 or more, reject the hypothesis with probability 

0.38 if this number is 6, and accept the hypothesis if this number is 5 or smaller. 
The power of the test is calculated to be as follows, by relation (22): 


1,(0.1875) = 1 — 0.8261 + 0.38 x 0.1489 ~ 0.230, 
TT/(0.25) = 1 — 0.5611 + 0.38 x 0.1828 ~ 0.508, 
TIp(0.375) = 1— 0.1156 + 0.38 x 0.0652 ~ 0.909. 
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If we suppose that the observed value of X is 7, then the P-value is: 
1 — Poi25(X < 7) +0.38Po.125(X = 7) = 1 — 0.9910 + 0.38 x 0.0207 = 0.017, 


so that the result is statistically significant. 
Next, for testing the hypothesis Hp: 9 > 69 against the alternative H4:0 < 6o, 
relations (8) and (9) become: 


1. Y 6 <0 
(Cini Y EXEC (23) 
0 if Yo Xi > C, 


and 


Pa (X <C-l+yPy(xX=C)=a, X= Xi ~ B(n, 00). (24) 


n 
i=l 
The power of the test is: 
T0) = PAX <C-1)+yPA(X =C) (0<00) X“B(n 0). (25) 
Numerical Example Refer to Example 1 and suppose that n = 25, 


6o = 0.0625, and a = 0.05. 


DISCUSSION Here, under 6, X ~ B(25, 0.0625), and (24) becomes 
Po.o625(X < C — 1) + y Po.o625(X = C) = 0.05, 


so that C = 0, and y Po. 0625 (X = 0) = 0.1992 y = 0.05. It follows that y ~ 0.251. 
Therefore the hypothesis is rejected with probability 0.251, if x = 0, and is 
accepted otherwise. 


APPLICATION 2 The Poisson Case In reference to Example 10, we have 
T@) =x, x = 0, 1, ..., and then in Theorem 2, V (2, ..., £n) = X; xi and 
V(X, ---, Xn) = Mi Xi ~ P(n0). Since Q(6) = log is strictly increasing, 
consider relations (8) and (9) for testing Hp: 6 > 6) against Ha: 6 < 0. They 
become here: 
ly 0 
Plai., En =y 1) 10 =C (26) 
0 if Yati > C, 


n 
Eqo(X1, ...,Xn)= Pa (X<C)+y Po (X=C)=0, X= Y X;¡ PO), 
i=l 
or 


Py (X <C-)Y+yPAa(xX =C)=a, X ~ Pn). (27) 
The power of the test is: 


mO)=P(X<SC-)+yPR(X=C) (0<0) X~P(nd). (28) 
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Unfortunately, no numerical application for Example 3 can be given as the 
Poisson tables do not provide entries for 0) = 200. In order to be able to apply 
the test defined by (26) and (27), consider the following example. 


Let Xj, ..., X2 be i.i.d. r.v.’s denoting the number of typographical errors in 20 
pages of a book. We may assume that the X;'s are independently distributed as 
P(@), and let us test the hypothesis Ho: 6 > 0.5 (the average number of errors 
is more than 1 per couple of pages) against the alternative Ha: 0 < 0.5 at level 
a = 0.05. 


DISCUSSION In (27), X ~ P(10), so that 
Pos(X < C -= 1) + yPos(X = C) = 0.05, 


and hence C—1 = 4and Po 5(X < 4) = 0.0293, Po5(X = 5) = 0.0671—0.0293 = 
0.0378. It follows that y = UO ee ~ 0.548. Therefore by (26), reject the 
hypothesis outright if x < 4, reject it with probability 0.548 if x = 5, and 
accept it otherwise. The power of the test is: For 6 = 0.2, X ~ P(4) and: 


mp(0.2) = Py2(X < 4) + 0.548P02(X = 5) = 0.6288 + 0.548 x 0.1563 ~ 0.714. 


If the observed value x is 6, then the P-value is (for X ~ P5(10)): P(X <5) + 
0.548 Po 5(X = 6) = 0.102. 


APPLICATION 3 The Normal Case: Testing Hypotheses About the 
Mean Refer to Example 11 and observe that T(x) = x and Q(0) is strictly 
increasing. Therefore the appropriate test for testing Hp: 0 < 6) against Ha: 
0 > 6 at level of significance a is given by (5) and (6) with y = 0. That is, 








ll MY a E 
Pll als a (29) 
0 otherwise, 
or 
j AO 
p(x, E Xn) = ° e (29) 
0 otherwise, 
because 
n 
a = Egp(Xı, ..., Xn) = (Sox > c) 
i=l 
Ee — 00) C — | 
= Pao > , 
o a/n 
so that e = 2, and therefore 
C =n + 2/0 /N; (30) 


this is so, because VW — Z ~ N(Q, 1), and recall that P(Z > 24) = a. 
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Figure 11.5 





Rejection Region of 
the Hypothesis Ap: 
0 < 0o (the Shaded 
Area) in the Form 
(29) and the 
Respective 
Probability 




















The power of the test is given by: 


109=1-07 + =| 8 > 0, (31) 


O 


because, on account of (30): 
T0) = 2 (Ea > c) = al ox — nd > n(0 — nse 
i=l i=l 
: pa e 2) 
CAD 4 PA | 


oO 





= p|z> a+ 


o 
since 2721 — Z ~ N(0, 1). 


Numerical Example In reference to Example 5, focus on patients treated 
with the new treatment, and call Y ~ N(6, 0?) (o known) the survival time. 
On the basis of observations on n = 25 such patients, we wish to test the 
hypothesis Ho: 0 <5 (in years) against Ha: 0 >5 at level of significance a = 
0.01. For simplicity, take o = 1. 


DISCUSSION Here 2% = 20.01 = 2.33, so that C = 25 x 5 + 2.383 x 1 x 5 = 
136.65. Thus, reject Ho if the total of survival years is >136.65, and accept Ho 
otherwise. 

The power of the test is given by (31) and is: 


For 9 = 5.5, 1,(5.5) = 1 — 0[2.33 + 5(5 — 5.5)] = 1 — 9(-0.17) 
= (0.17) = 0.567495; 

and for 9 = 6, z,(6) = 1 — [2.33 + 5(6 — 5.5)] = 1 — ®(—2.67) 
= (2.67) = 0.996207. 
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If we suppose that the observed value of Lei x; is equal to 138, then the 
P-value is POS, X; > 138) = 1 — 623825) = 1 — 9(2.6) = 1 — 0.995339 = 
0.004661, so that the result is highly statistically significant. 





APPLICATION 4 The Normal Case (continued) Testing Hypotheses 
About the Variance Refer to Example 12, where T(x) = (x— u)? (u known) 
and Q is strictly increasing. Then, for testing the hypothesis Ho: o? > op against 
the alternative Ha: 0? < of (or 0 > 6 against 6 < 6) with @ = o? and O, = o$) 
at level of significance a, the appropriate test is given by (8) and (9) (with 
y = 0), and it is here: 
: n 2 
om «5 Ta) = h PAA (32) 


0 otherwise, 


or 
1 ify (#7 < x2 
Pt, -.., In) = Liar (Sa) < Xara (82) 
0 otherwise, 
because 
n 
a= Egp(X1, esey Xn) = Poz [$a = wy < c 
i=l 
n X — 2 C 
= P y (42) < 3 |» 
a 00 % 
so that S = x?,_, and therefore 
2 
C = OÈ Xaia (33) 
this is so, because $; (ZP ~ x2. 


00 


Figure 11.6 





Rejection Region of 
the Hypothesis Ho: 
o? > dí (the 
Shaded Area) in the 
Form (32’) and the 
Respective 
Probability 














By slightly abusing the notation and denoting by x2 also a r.v. which has 
the x? distribution, the power of the test is given by: 


ç ¢ o? ¢ ç 
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because, on account of (33): 
xy(0*) =P, | $a- aie pea 
i=l 
-r| (E) £) ¿E d= P( rt, i 
O O 


i=1 





since Ni ELY ~ xè. 


Numerical Example Suppose n = 40, 09 = 2, and œ = 0.025. For sim- 
plicity, take u = 0. 


DISCUSSION Here Kaia = X40:0.975 = 24.433 and C = 4 x 24.433 = 
97.732. Thus, by means of (32), the hypothesis is rejected if 7%, a? < 97.732, 
and it is accepted otherwise. 

For o = 1.25, for example, the power of the test is, by means of (34), 


97.732 
Tolo?) = mo(2.25) = px < Les) 


= P(xio < 62.548) = 0.986 


(by linear interpolation). 
If we suppose that the observed value of paar, x? is 82.828, then the P-value 
is 


40 40 2 
Xi 
P, ( J X? < 52:08) = | J (ž) < zor = 0.05, 


i=1 i=1 


which indicates strong rejection. 


APPLICATION 5 The Normal Case (continued) Testing Further 
Hypotheses About the Mean In reference to Example 11, T(x) = x and 
Q(@) is strictly increasing. Therefore, for testing Hp: 6 < 6; or 0 > 62 against 
Ha: 01 < 0 < bz at level of significance a, the test to be employed is the one 
given by (11) and (12)-(13), which here becomes (y1 = ya = 0): 
1 ii<% n <C 
T Me a (35) 


0 otherwise, 


E o(Xı, -.-, Xn) = Po, (a < NX; < a) = 
(36) 


a, 


n 
EoP(X1, ..., Xn) = Po (a < y Xi < a) 
=A 


and ) ¿1 Xi ~ N(n6j, no”), i = 1, 2. 
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For the purpose of utilizing the Normal tables, (36) are rewritten thus: 


C2 — NO; Cis ni E 
o( aa ) o( oF )-< i=1,2. (37) 


The power of the test is calculated as follows: 


Cy — n0 Ci- 8 
z0 = (| e) o( T | (01 < 0 < 62). (38) 


Numerical Example In reference to Example 4, suppose n = 25, 0, = 
1, 62 = 3, and a = 0.01. For simplicity, let us take o = 1. 








DISCUSSION Here no, = 25, n02 = 75, and (87) yields: 


Cy — 25 Ci-25\  /C2-75 Ci- 75\ _ 
o( : ) o( z )-o( z ) o( i ) = 001 (39) 


Placing the four quantities 4%, 45 42% and 22 under the N(0, 1) 
curve, we observe that relation (39) abtan any for: 




















Cı — 25 Cz — 75 Ca — 25 Ci-75 
= and = ; 
5 5 5 5 
which imply that C1 + Co = 100. Setting Cı = C, we have then that Cp = 100 — C, 
and (39) gives: 
75 — -2 
o( ia £) o(£ 5 >) = 0.01. (40) 


From the Normal tables, we find that (40) is closely satisfied for C = 36.5. 
So Cı = 36.5 and hence Cy = 63.5, and the test rejects the hypothesis Ho 
whenever Y 7, a; is between 36.5 and 63.5 and accepts it otherwise. 

The power of the test, calculated through (38), is, for example, for 9 = 2.5 
and 6 = 2: 


(1.5) = (2.5) = 0.57926 and £,(2) = 0.99307. 


3.1 (i) In reference to Example 8 in Chapter 1, the appropriate model is 
the Binomial model with X; = 1 if the ¿th young adult listens to the 
program, and X; = 0 otherwise, where P(X; = 1) = p, and the X;’s 
are independent, so that X = >;_, Xi ~ B(n, p). 

(ii) The claim is that p > po some specified number 0 < pọ < 1, and the 
claim is checked by testing the hypothesis Hp: p < po against the 
alternative Ha: p > po at level of significance a. 

(iii) For pp = 5%, n = 100, and a = 0.02, use the CLT to carry out the 
test. 


3.2 (i) In reference to Example 9 in Chapter 1, the appropriate model is 
the Binomial model with X; = 1 if the ¿th item is defective, and 0 
otherwise, where P(X; = 1) = p, and the X;’s are independent, so 
that X = $`; X: ~ Bn, p). 
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(ii) The process is under control if p < Pp, where po is a specified 
number with 0 < po < 1, and the hypothesis to be checked is Ho: 
p > po against the alternative H4: p < po at level of significance a. 

Gii) For po = 0.0625, n = 100, and a = 0.10, use the CLT to carry out the 
test. 


3.3 (i) Inreference to Example 10 in Chapter 1, the appropriate model is the 

Binomial model with X; = 1 if the ith flawed specimen is identified 
as such, and X; = 0, otherwise, where P(X; = 1) = p, and the X;’s 
are independent, so that X = > ;_, Xi ~ Bn, p). 

(ii) The electronic scanner is superior to the mechanical testing if p > 
Po, some specified py with 0 < po < 1, and this is checked by testing 
the hypothesis Ho: p < po against the alternative Ha: p > po at level 
of significance a. 

(iii) For po = 90%, n = 100, and a = 0.05, use the CLT to carry out the 
test. 


3.4 (i) In a certain university, 400 students were chosen at random and 
it was found that 95 of them were women. On the basis of this, 
test the hypothesis Ao: the proportion of women is 25% against the 
alternative Ha: the proportion of women is less than 25% at level of 
significance a = 0.05. 

(ii) Use the CLT in order to determine the cutoff point. 

3.5 Let X], ..., Xn be independent r.v.’s distributed as B(1, p). For testing the 
hypothesis Ho: p < 5 against the alternative Ha: p > F, use the CLT in 
order to determine the sample size n for which the level of significance 
and power are, respectively, a = 0.05 and z (7/8) = 0.95. 


3.6 Let X be a r.v. distributed as B(n, 0), 0 € Q = (0, 1). 

G) Use relations (19) and (20) to set up the UMP test for testing the 
hypothesis Ho: 0 < 9 against the alternative Ha: 0 > 0o at level of 
significance a. 

(ii) Specify the test in part (i) for n = 10, 6) = 0.25, and a = 0.05. 

(iii) Compute the power of the test for 6; = 0.375, 0.500. 

(iv) For 6 > 0.5, show that: P(X < C)=1-—P¡ ¿(X<n-C-— 1) and 
hence Po(X = C) = Pi_6(X < n— C) — Pi-(X <n- C- 1). 

(v) Use part (iv) to compute the power of the test for 0; = 0.625, 0.875. 

(vi) Use the CLT in order to determine the sample size n if 6) = 0.125, 
a = 0.1, and z (0.25) = 0.9. 


3.7 (i) In reference to Example 12 in Chapter 1, the appropriate model to 
be used is the Poisson model; i.e., X ~ P(A). 
(ii) The safety level is specified by à < 200, and this is checked by testing 
the hypothesis Hp: A > 200 against the alternative Ha: A < 200 at 
level of significance a. 
(iii) On the basis of a random sample of size n = 100, use the CLT in 
order to carry out the test for a = 0.05. 
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3.8 The number of total traffic accidents in a certain city during a year is a 
r.v. X, which may be assumed to be distributed as P(A). For the last year, 
the observed value of X was x = 4, whereas for the past several years, 
the average was 10. 

(i) Formulate the hypothesis that the average remains the same against 
the alternative that there is an improvement. 

(11) Refer to Application 2 in order to derive the UMP test for testing the 
hypothesis of part (i) at level a = 0.01. 


3.9 (i) In reference to Example 16 in Chapter 1, a suitable model would 
be to assume that X ~ N(j11, 07), Y ~ N(u2, 07) and that they are 
independent. 

(ii) Let uo be the known mean survival period (in years) for the existing 
treatment. Then the claim is that uz > uo, and thisisto be checked by 
testing the hypothesis Ho: u2 < uo against the alternative Ha: a > 
uo at level of significance a. 

Gii) Carry out the test if n = 100, wo = 5, and a = 0.05. 


3.10 The life length of a 50-watt light bulb of a certain brand is a r.v. X, which 
may be assumed to be distributed as N(u, 0?) with unknown u and o 
known. Let X;,..., Xn be a random sample from this distribution and 
suppose that we are interested in testing the hypothesis A: u = uo against 
the alternative Ha: u < uo at level of significance a. 

(i) Derive the UMP test. 
(ii) Derive the formula for the power of the test. 
(iii) Carry out the testing hypothesis problem when n= 25, uy = 1,800, 
o = 150 Gn hours), æ = 0.01, and x = 1,730. Also, calculate the power 
at u = 1,700. 


3.11 The rainfall at a certain station during a year is a r.v. X, which may be 
assumed to be distributed as N(u, 0?) with y unknown and o = 3 inches. 
For the past 10 years, the record provides the following rainfalls: 


xı = 30.5, x2=34.1, x3=27.9, x4=294, x5 = 35.0, 

Xe = 26.9, Ly = 30.2, Xg = 28.3, Xg = 31.7, Mo = 25.8. 
Test the hypothesis Ho: u = 30 against the alternative Ha: u < 30 at level 
of significance a = 0.05. 


3.12 Let X;, i = 1,...,4 and Y;, j = 1,..., 4 be two independent random 
samples from the distributions N(u1, 07) and N(j12, 03), respectively. 
Suppose that the observed values of the X;'s and the Y;’s are as follows: 

xi=10.1, w%=84, x3=143, w= 11.7, 
yı = 9.0, Ya = 8.2, Ya = 12.1, Ya = 10.3. 
Suppose that 0, = 4 and o2 = 3. 


Then test the hypothesis that the two means differ in absolute value by 
at least 1 unit. That is, if 6 = uy — u2, then the hypothesis to be tested 
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is Ap: |0| < 1, or, equivalently, Hp: 9 < —1 or 6 > 1. The alternative is 
Hy: —1<0 < 1. Take a = 0.05. 


Hint: Set Z; = X;—Y;,so that the Z;’s are independent and distributed 
as N(u, 25). Then use appropriately Theorem 3. 


3.13 (i) On the basis of the independent r.v.’s Xj, ..., X25, distributed as N(0, 
o°), test the hypothesis Hp: o < 2 against the alternative Ha: o > 2 
at level of significance a = 0.05. 
(ii) Specify the test when the observed values x;'s of the X;’s are such 
that PP, a? = 120. 


3.14 The diameters of bolts produced by a certain machine are independent 
r.v.’s distributed as N(u, 0?) with u known. In order for the bolts to be 
usable for the intended purpose, the s.d. o must not exceed 0.04 inch. A 
random sample of size n = 16 is taken and it is found that s = 0.05 inch. 
Formulate the appropriate testing hypothesis problem and carry out the 
test at level of significance a = 0.05. 


[i 11.4 Likelihood Ratio Tests 


In the previous sections, UMP tests were constructed for several important 
hypotheses and were illustrated by specific examples. Those tests have the 
UMP property, provided the underlying p.d.f. is of the exponential type given 
in (4). What happens if either the p.d.f. is not of this form and/or the hypotheses 
to be tested are not of the type for which UMP tests exist? One answer is for 
sure that the testing activities will not be terminated here; other procedures 
are to be invented and investigated. Such a procedure is one based on the 
Likelihood Ratio, which gives rise to the so-called Likelihood Ratio (LR) tests. 
The rationale behind this procedure was given in Section 3 of Chapter 8. What 
we are doing in this section is to apply it to some specific cases and produce 
the respective LR tests in a usable form. 

As already explained, LR tests do have a motivation which is, at least intu- 
itively, satisfactory, although they do not possess, in general, a property such as 
the UMP property. The LR approach also applies to multidimensional parame- 
ters and leads to manageable tests. In addition, much of the work needed to set 
up a LR test has already been done in Section 1 of Chapter 9 about MLE's. In our 
discussions below, we restrict ourselves to the Normal case, where exact tests 
do exist. In the next chapter, we proceed with the Multinomial distribution, 
where we have to be satisfied with approximations. 

The basics here, as we recall from Chapter 8, Section 8.3, are as follows: 
X1,..., Xn is a random sample from the p.d.f. f(; 0), 0 € Q CR", r = 1, 
and w is a (proper) subset of 2. On the basis of this random sample, test the 
hypothesis Ho: 0 € w at level of significance a. (In the present framework, the 
alternative is Ha: 0 ¢ w, but is not explicitly stated.) Then, by relation (8) in 
Chapter 8 and the discussion following it, reject Hy) whenever 


A < Ao, Where Ao is a constant to be specified, (41) 
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or 


g(A) > go), or g(A) < g(Ao), for some strictly monotone function g; (42) 


g(A) = —2 log à is such a function, and Hp may be rejected whenever 
—2logA > C, a constant to be determined (43) 
(see relation (10) in Chapter 8). 
Recall that 
Lð) 
A= AVR = 44 
(a n) LE) (44) 


where, with x = (2, ..., 2), the observed value of (X1, ..., Xn), L(Ê) is the 
maximum of the likelihood function L(@ | x), which obtains if O is replaced by 
its MLE, and L(@) is again the maximum of the likelihood function under the 
restriction that 8 lies in œ. Clearly, L(®) = L(@,,), where @,, is the MLE of 0 
under the restriction that 0 lies in œw. Actually, much of the difficulty associated 
with the present method stems from the fact that, in practice, obtaining Ó,, is 
far from a trivial problem. 

The following two examples shed some light on how a LR test is actually 
constructed. These examples are followed by aseries of applications to normal 
populations. 


Determine the LR test for testing the hypothesis Hp: 6 = 0 (against the alterna- 
tive Ha: 0 Æ 0) at level of significance a on the basis of one observation from 
the p.d.f. f(a; 0) = 4 x HEY xe, 0 > 0 (the Cauchy p.d.f.) 





DISCUSSION First, f(-; 0) is a p.d.f., since 


1° dx 1 (” dy 
= by setting x — 0 = 
L ae ere 4 


1 7/2 
= =/ dt=1 
T J-7/2 
2 sin” 1 1 
(vy setting y = tant, so that 1+ y* = 1 + —— 


cos?t cos? t’ 


dy d/sint 1 and T i T 
= = —-—<t<—)}. 
dt dt\cost cos? t’ 2 2 








Next, clearly, L(0 | x)(=f(#; 0)) is maximized for 6 = x, so that à = + x nz/ 
+ = ym, and A < Ao, if and only if x? > ¿—1=C,orx < -C or v > C, 
where C is determined through the relation: P(X < —C or X > C) = a, or 


P(X > C) = 5 due to the symmetry (around 0) of the p.d.f. f(x; 0). But 


e 1 dx 1 7? l/r a 
PX > C)= = == dt = —|— —tan !C ==: 
Ce [ T i 1+2 Me “(5 ) 2 
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or tan! C = TA and hence C = tan( 27), So, Ho is rejected when- 
ever x < — tan(2*) or x > tan( 2). For example, for a = 0.05,C = 
tan(0.475z) ~ 12.706, and Hp is rejected when x < —12.706 or x > 12.706. 


Let X;,..., Xn be a random sample of size n from the Negative Exponential 
p.d.f. f(x; 6) = 6e-*”, x> 0 (0 > 0). Derive the LR test for testing the hypoth- 
esis Hp: 0 = 6 (against the alternative Ha: 0 Æ 6) at level of significance a. 


DISCUSSION Here 


n 
L(@|x)=0"e, where x=(m,...,%) andt = yes 
= 
We also know that the MLE of 0 is Ô = 1/1 = n/t. Therefore the LR A is 


given by: 
A = Ore ht = “on = ay "Pet 
9 t n : 


my 1/n 
and hence 4 < Ao, if and only if tett < Co(= Har ). We wish to determine the 


cutoff point Co. To this end, set g(t) = te“ (d = 6,/n) and observe that g(t) is 
increasing for 0 < t < 4 = es decreasing for t > a? and attains its maximum 
at t = n/0 (see Figure 11.7). It follows that te“ < Co, if and only if t < Cı or 
t > C2. Therefore, by setting T = Y", X;, we have: P;,(Te-*! < Co) = a, if 
and only if P,,(T < Cı or T > C2) = a. For simplicity, let us take the two-tail 
probabilities equal. Thus, 








P(T < C1) = P(T > C2) = a 
Figure 11.7 
Graphical 
Determination of the 
Rejection Region 

















By the fact that the independent X;’s have the f(x; 00) = Gpe %* p.d.f. 
(under Ho), it follows that T is distributed as Gamma with a = n and Bp = al 
Therefore its p.d.f. is given by: 


gn 
= LM! t>0. 
Sr( ) ro) e > > 


Then Cı and C2 are determined by: 


C1 00 a 
/ frlt)dt = f ES 
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In order to be able to proceed further, take, e.g., n= 2. Then 
fr(t) =0 te", t>0, 


and 


Ci 
/ Gte ®t dt = 1-0 PA — Cet, 
0 


CO 
/ OPte © dt = e 9% + GCE ee, 
C2 

Thus, the relations Py, (T < C1) = Pa (T > C2) = 5 become, equivalently, 

for œ = 0.05: 
0.975e? —p=1, p= C]; 0.025e8 —q=1, q=0C>. 

By trial and error, we find: p = 0.242 and q = 5.568, so that Cı = 0.242/6 
and C2 = 5.568/09. Thus, for n = 2 and by splitting the error a = 0.05 equally 
between the two tails, the LR test rejects Ay when t(=x1 + x2) < 0.242/0 or 
t > 5.568/07. For example, for 6) = 1, the test rejects Ho when t < 0.242 or 
t > 5.568. 


APPLICATIONS TO THE NORMAL CASE The applications to be dis- 
cussed here are organized as follows: First, we consider the one-sample case 
and test a hypothesis about the mean, regardless of whether the variance is 
known or unknown. Next, a hypothesis is tested about the variance, regardless 
of whether the mean is known or not. Second, we consider the two-sample 
problem and make the realistic assumption that all parameters are unknown. 
Then the hypothesis is tested about the equality of the means, and, finally, the 
variances are compared through their ratio. 


L| 11.4.1 Testing Hypotheses for the Parameters in a Single Normal Population 


Here Xj, ..., Xn İs a random sample from the N(u, o°), and we are interested 
in testing: (i) Ho: u = mo, o known; (ii) Ho: y = uo, o unknown; (iii) Ho: 
o = o (oro? =o), u known; (iv) Ho: o = oo (oro? = 0), u unknown. 


DISCUSSION 
Ci) Ao: uw = Mo, o known. Under Ho, 


1 n 
AV 2)—n/2 : 2 
Lô) = Qro?y" apl -zi dm — mo) | 
and 
A 2)—n/2 LY 232 ; i = 
L(®) = Cro’) exp| -> X (ai — 2)" |, since fig = &. 
20 = 
Forming the likelihood ratio à and taking —2 log à, we have: 
Vie — w) 
O 


12 ; P 
-2logà == Dl Ho) - @ - #7] = | (45) 
i=1 
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Then 


(some C = i > 0), 


= 2 
—2logi > Ao, if and only if pe > 0? 
Oo 


and this happens, if and only if 


Vn — Ho) P n — uo) E 
g 


O 


C, or C. 





Under Ab, PEGO ~ N(O, 1), so that the relation 


pee Ho) o oy VBE = Ho) c| 
O oO 








C,o =a, gives C = 2y/2. 


Thus, the likelihood ratio test is: 


1 if A) < —Zq/2, OY Lio) > Zaj2 


(46) 


0 otherwise. 


Hrd =| 


Figure 11.8 





Rejection Region of 
the Hypothesis Hp in 
(i) (the Shaded 
Areas), and the 
Respective 
Probabilities of 
Rejection 




















Since, for any yu, 


y/n(T— uy) 
oO 


> 2y/2 is equivalent to Hy + 2y/2, (47) 





vn- nu) 7 Vn(o — 
O oO 


and likewise 


Jn — po) vn- u) a n(o — u) 
O O 





< —Za/2 is equivalent to 





2a/2, (48) 


oO 


it follows that the power of the test is given by: 
n(uo — n(mo — 
ol mn 1) 4 an] T of 7u a) an] l 





To(u)=1 (49) 


Numerical Example Suppose n= 36, uo = 10, and let a = 0.01. 
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DISCUSSION Here 2/2 = 20.005 = 2.58, and, if o = 4, the power of the test 
is: 





For y = 12, Vaaa) = 600-12) = —3, so that 
y(12) = 1 — P(-0.42) + #(—5.58) ~ (0.42) = 0.662757; 
and for u = 6, 1,¿(6) = 1 — (8.58) + P(3.42) ~ 0.999687. 
Gii) Ao: u = no, o unknown. Under Ho, 


r n2 —n/2 1L£ 2 n2 —n/2 n 
L(ô)= (2763) apl -z dm — uo) | = (216) exp(-5), 
and 


L(S%) = (21163) "" eol -z Da = el = (2062)? exp(- 3) 


since 62 = 2% ,(% — wo)” and 62 = + 


n/2 = 
jie ôg pm Yata 
A , or = ETE Pa NA 
Dii — Ho) 





110%, — DY. Then 


G2 


w 


Observe that 


n 


dei Hoy = la - 2) + @— wo) =D (0 - 8 + me wo)’, 


i=1 i=1 


and set t = ./n(% — po) / Jz Xi (@i — 7. Then 








y2/m SN Leti = T) _ 1 
z n A PPNE AÑ a — a £ 
EP ERE uo) 1+4 r 
1 1 











-= = 3 
1 aa a x aCe 1 + 1 | /MIE—H0) | 
MER n—-1 I 7 02 
mel ia (iD 





1 
“ae 


Since g(A) = A2/" is A strictly increasing function of A, the LR test rejects Ho 
when 12/” > Ci or E = > Cı or14 4 < Ca or t? < C3 or, finally, t < —C or 





t 


t > C. Under Ho, the distribution of 
VMUX — mw) 
yan E -r 


is t,_1. Since P [t(X) < —C, ort(X) > C] = a, it follows that C = tp_1:0/2- 
Therefore, the LR test is: 





(Xx) = 





1 ift< —tn-1; /2 ort > tn—1, /2 
m=] ~ á (50) 


0 otherwise, 
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where 
1 n 
t= tx) = Mi po) | |) 0-27. 61) 
n=l z 
Figure 11.9 
Rejection Region of 
the Hypothesis Ho in 


(ii) (the Shaded 
Areas), and the 
Respective 
Probabilities; Here 








C= tn— 1;a/2 











Numerical Example If n = 85 anda = 0.01, we find that to-1:0/2 = 
ts4:0.005 = 2.6356. Thus, the test rejects Ho whenever tis <—2.6356 ort > 2.6356. 


(iii) Ho: o = oo (or o? = 0), u known. Under Ho, 


L() = (2103) "” eaol -z es = 


205 El 


and 


wr | 


x 2 1 J = n 
LÊ) = (2765) "" e| 372 cen = (2763) "” exp(-5), 
Q ¿=1 


ince 62 a L5” da 2 
since 65 = ;, » ¡(1% — u)“. Therefore 


67 a 2 1 E 2 
a= ($) onl e. 


0 0 ¿=1 


/2 
12 Ci. 27” 12 

sgel s (EZ ) e 
ME 90 "| 2 i=l 


; nu Š 
= e" u"? ex (- >) where u= — 
p 2)” n >, 


i=1 


2 
Xp 
00 ` 


The function 4 = 1(u), u > 0, has the following properties: 


A(u) is strictly increasing for 0 < u < 1, 


A(u) is strictly decreasing for u > 1, 


max{A(u); 0 < u < oo} = A(1) = 1, and (52) 


A(u) > 0, as u— oo, and, of course, 


M0) = 0. 


On the basis of these observations, the picture of 1(u) is as in Figure 11.10. 
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Figure 11.10 
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Therefore 1(u) < Ao if and only if u < C1, or u > Co, where Cı and C2 are 

determined by the requirement that 
1 x = 2 
P „(U < Ci, r U > C) =o, U=- NS) . 
i=1 \ % 

However, under Ho, NE ~ x2, so that U = £ with X ~ x?. Then 
P,,U < Ci, or U > Co) = Py (X < nC¡, or X > nCz) = a, and, for con- 
venience, we may take the two-tail probabilities equal to 5. Then nC, = 
e 1-a/2 NC2 = pe ¡2 Summarizing what we have done so far, we have: 
1 if Ni (4) = Ne imayes or J; (E) = Xia)? 


0 otherwise. 





P(X1, ..., Zn) = 
(53) 


Figure 11.11 
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the Hypothesis Ho in 
(iii) (the Shaded 
Areas), and the 
Respective 
Probabilities; Here 
Ci = 130 C2 = 


2 
Xn;g 

















Numerical Example Forn = 40anda = 0.01, wefind Xy 44/2 = X40.0.995 = 
20.707 and x;...2 = X40.0.005 = 66.766. Therefore the test rejects Hy) whenever 
Ni (EL is either < 20.707 or > 66.766. 


(iv) Ho: o = o9 (oro? = of), y unknown. Under w, 


7 1 
LO) = (2103) "" e| oA a| since fl. = 1, 
0 i= 
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and 


n 
L(Q) = (2762) ”” s| 37 > En = z| = (2763) "” exp(- 5) 
a 41 





A n 


since 65 = + )-,(% — 2). Therefore 
A2\ n/2 n 
O n/2 1 232 
= (| -5 ee ERO] ¡Y |, 
($) «| I? dm *| 
and then proceed exactly as in the previous case with u = + Le Y, in 
order to arrive at the following modified test; namely, 


A pp 2 NA 
1 if Lait | = oi or D l5) 
Ar ra 60 
0 otherwise. 
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Numerical Example With the values of n and a as in the previous case 
(n = 40, a = 0.01), we find x7 1.1/2 = X300.905 = 19.996 and x$ 14/2 = 
X39.0.005 = 65.476, so that the test rejects Hy whenever > `; (45) is < 19.996 
or > 65.476. 


L. 11.4.2 Comparing the Parameters of Two Normal Populations 


Here, we have two independent random samples X}, ..., Xm ~ N(u, of) and 
Yi,..., Yn ~ N (Ha, 03) with all parameters unknown. The two populations are 
compared, first, by way of their means, and second, through their variances. 
When comparing these populations through their means, itis necessary from a 
mathematical viewpoint (i.e., in order to be able to derive an exact distribution 
for the test statistic) that the variances, although unknown, be equal. 


Ci) Ab: 1 = u2 = y, say, unknown, 01 = 02 = 0, say, unknown. 
The (joint) likelihood function of the X;'s and the Y;’s here is 


Era yan al -z l 074) u- oo l (55) 
i=1 


j=l 
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Maximizing (55) with respect to u1, u2, and o”, we find for their MLE’s: 


` 7 ` 1 m E n 7 
M=k feH=7 ôk = bac -3f +) w- J (56) 
i=1 j=l 


m+n 





Hence 





LÊ) = (2162) eo >). (57) 


Next, under Ho, the (joint) likelihood function becomes 


if n 
Ero? y mn ex =a l Y m= + y - e l (58) 
i=1 


j=1 


from the maximization (with respect to u and o?) of which we obtain the MLE's 


mī + ny 
Pry = an (En+Eu)- m+n’ 














(59) 
L [Eear hua 
a2 A A 
= (xi — ho) + (y; = Poy |. 
M+ nN! jal. 
Inserting these expressions in (58), we then have 
L(@) = (2162) 7 exp(- a ») , (60) 
Thus, the likelihood function becomes, on account of (57) and (60), 
A 2 min A 2 
a= (2 or ¿2/0 - 92 (61) 
63) > a 


Next, 


Y hy = Ni -D)4 (8 MY =X - 2? + ME- fu? 
i=1 


j=l 





m z ya 
i=1 


m+n m+? 


and likewise, 





n . n g mntz _ D? 
du MY = 20 D+ tae 


Then, by means of (56) and (59), ĉĉ is written as follows: 


a 1 m a ee mrr(x— y + mena- yy 
o = @-Z +) uy- |+ 


~ m+n (m+ ny 








mn- P MHN _ > ma- y" 
me 8 men 





=ô? + (62) 
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Therefore (61) yields, by way of (62) and (56), 





4.2 
2/0m4n) — 00 e 1 
22 mī- y? mai yy 1 
Ca min? E mar X z 


ma- D Jaa 
(min? / | 


m n -1 
0-0 I Ye -+Y (u; — J | 


i=l j=l 





1 
m+n—2 








1+ g- yy / (m+n 2) x 


pa 
mE 
i 


m n al 
x En =a +2 > J | 
i=l j=l 


-1 


minr- Y) 
Lap (m+ n—2) 
Vial Ea P+ Ly - D] 
? > 
= (1+ ao) , where 
m+n-2 


mn F 7 1 m g n B 
t= te, D= jan P 7 om 2 DB or | 


(63) 

















mE n 








So, à?/™+® = (14 3) 'andhenceà = (AAC +13) 7 Since A is strictly 
decreasing in 1?, the LR test rejects Hy whenever t? > Co or, equivalently, 
t < —C ort > C. The constant C is to be determined by 


Py, [UX, Y) < —C, or t(X, Y) > C] =a, (64) 


where t(X, Y) is taken from (63) with the x;'s and the y;'s being replaced by 
the r.v's X;’s and Y;’s. However, under Ap, 


t(X, Y) ~ tmin—2) (65) 


so that (64) yields C = tm+n-20/2. In conclusion, then, 


1 if t, y) < —btmin—2:0/2, or t(x, y) > tm+n-2;a/2 


66 
0 otherwise, oo 


p(x, y) = | 


where t(x, y) is given by (63). 
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Figure 11.13 





Rejection Region of 
the Hypothesis Ho in 
(i) (Shaded Areas), 
and the Respective 
Probabilities; Here 
C= bn+n—2; 











Numerical Example For m = 40,n = 50, anda = 0.01, we get 
tmtn—2:0/2 = tss:0.005 = 2.6329, and hence the hypothesis about equality of 
means is rejected whenever |t(x, y)| > 2.6329. 


(ii) Hp: 01 = 02 = 0, say (or of = oF =o", say), 11, u2 unknown. The (joint) 


likelihood function of the X;’s and Y;'s is here 


n 


men m, 2\—n, 12 1 
ry 7 (of) dl 07) | 20 dm way 207 > yy war 


2 j=l 











(67) 


Maximizing (67) with respect to all four parameters, we find the following 
MLE’s: 








. 2 12 a 1% i 
ñira =f, za =9, 6 2= 2D) (0-07, Go=-YY-D. (68) 
al el 
Then 
MAN 7 —-M/A —n/2 m+n 
LÊ) = Ory” (62g) 62a)” exp(- > | (69) 


Under Hp, the likelihood function has the form (55), and the MLE’s are already 
available and given by (56). That is, 





m+n 


` E 7 E n 1 m E n 7 
Pra =, teh êlo = Es -5P +) (u; — Ji (70) 
i=1 j=l 


Therefore, 





L(@) = Cry” (62) exp (E >) (71) 
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For simplicity, set )7;-,(%; — D? =a, ¡(Y — Y = b. Then the LR is, 
by means of (68) through (71), 





a yey E mom 2n "2gp 
= (62 ee m/2 (m+ mn) 2a, + Hyorrv/2. 
lo 
(m+n)/2 m/2 
m+n a/b 
ES cane) (a/b) (dividing by b™*/2) 





mint? “a4 gone 


Cate) iil 
( 


z b li 








minera (Ga) 
= mu/2qn/2 [1 + m— L) 











m—1/ n-—1 








m 2 
minar (mtu) A $ e 
= (m+n)/2? ~m—1 m—1' 





x 
m/2yN/2 =] 
ee ae (1+ Tou 


nS 


So 


= m/2 
(mt mene (8) 


x 
m/24)n/2 —1..\(mtn)/2? 
ji (1+ 51%) 


à = Au) = 





u>0. (72) 


The function 4(u) has the following properties: 


A(0) = 0 and A(u) > 0, as u —> œ, 


Lu) = 0 for u = uw = uo ie Lu) > 0 foru < uy, and 





(73) 


Lu) < 0 for u > uy, so that A(u) is 


maximized for u = w, and A(up) = 1. 
On the basis of these properties, the picture of 1(u) is as in Figure 11.14. 


Figure 11.14 


The Graph of the 
Function A = A(u) 
Given in Relation 
(72) 
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Therefore 1(u) < Ap, if and only if u < Cı or u > Co, where Cı and C2 are 
determined by the requirement that 


ia (A — X/(m— 1) 





Pg, (U < Ci, orU > C>)=0q, where U= (74) 


Dj (Y - YY /m-D' 


Figure 11.15 





Rejection Region of 
the Hypothesis Ap in 
(ii) (Shaded Areas), 
and the Respective 
ie eal Here 
Ci = = Fn- 1n—1;1-5»> 
Ca = Em-1n-1:3 

















Under Ho, U ~ Fm-1,n-1, and this allows the determination of C1, C2. For 
simplicity, we may split the probability œ equally among the two tails, in which 
case 

Cy = Fin-1,n—1;1-0/2> Ce = Fm-1,n-1;a/2- 
To summarize then, the LR test is as follows: 


1 ifu(x, < Fr-1n-1:1-a/2) OF WX, Y) > Frr-1n—-1:0 
vs = | ¢ Y) Ln-ll—a/2 Ln—lia/2 (75) 


0 otherwise, 


where 

a m=-D'/(m=- 1) 

jay - Dm- 1) 
Numerical Example Letm= 13, n= 19, and take a = 0.05. 





ux, y) = (76) 


DISCUSSION If X-Fi 18, then we get from the F-tables: P(X 
Fi2,18;0.025) = 0.025, or P(X < Fiz180.025) = 0.975, and hence F218;0.025 
2. 1682. Also, P(X< Fis 180.975) = 0. 025, or P(X> Fiz 180.975) = 0.975, or PGE 
Fou) = 0.975. But then 4 x ~ Fis12, and therefore p = 3.1076, and 

12,18;0.975 12,18;0.975 


hence Fiz180.975 ~ 0.3218. Thus, the hypothesis Ap is rejected whenever 
u(x, y) < 0.3218, or u(X, Y) > 2.7689, and itis accepted otherwise. 


A ll Vv 





4.1 A coin, with probability € of falling heads, is tossed independently 100 
times and 60 heads are observed. At level of significance a = 0.1: 
(i) Use the LR test in order to test the hypothesis Hp: 6 = 1/2 (against 
the alternative H4: 0 4 1/2). 
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(11) Employ the appropriate approximation (see relation (10) in Chapter 
8) to determine the cutoff point. 


4.2 Let X1, X2, X3 be independent r.v.'s distributed as B(1, 0),0 € Q = (0, 1), 
and let t = x, + x2 + x3, where the x;'s are the observed values of the 
X¡'s. 

(i) Derive the LR test A for testing the hypothesis Ho: 6 = 0.25 (against 
the alternative Ha: 0 4 0.25) at level of significance a = 0.02. 

(ii) Calculate the distribution of A(T) and carry out the test, where T = 
X, + X2 + X3. 


4.3 (i) In reference to Example 15 in Chapter 1, the appropriate model to 
be employed is the Normal distribution N(u, 0?) (with u > 0, of 
course). 

Gi) If no is the stipulated average growth, then this will be checked by 
testing the hypothesis Ho: u = jo (against the alternative Ha: u % 
no) at level of significance a. 

(iii) On the basis of a random sample of size n, use the likelihood ratio 
test to test Hp when n = 25, y = 6 inch, anda = 0.05. 


4.4 (i) Inreference to Example 17 in Chapter 1, an appropriate model would 
be the following. Let X; and Y; be the blood pressure of the ith 
individual before and after the use of the pill, and set Z; = Y;—X;, i = 
1,..., n. Furthermore, it is reasonable to assume that the X;’s and 
the Y;’s are independent and Normally distributed, so that the Z;’s 
are independently distributed as N(u, 07). 

(ii) With u denoting the difference of blood pressure after the usage of 
the pill and before it, the claim is that u < 0. This claim is checked by 
testing the hypothesis Ho: u = 0 (against the alternative Ha: u 4 0, 
with the only viable part of it here being u > 0) at level of significance 
a, by using the likelihood ratio test. 

(iii) Carry out the test if n = 90 and a = 0.05. 


4.5 In reference to Example 25 in Chapter 1: 

(i) Fori = 1,..., 15, let X; and Y; be the heights of the cross-fertilized 
plants and self-fertilized plants, respectively. It is reasonable to as- 
sume that the X;'s and the Y;'s are independent random samples with 
respective distributions N(u1, o?) and N(uz, 0%) (the estimates of 
of and ae do not justify the possible assumption of a common vari- 
ance). Setting Z; = X; — Y;, we have that the Z;’s are independent 
and distributed as N(u, o°), where u = 11 — H2, o°? = of + 08. 

(ii) The claimis that y > 0, andis to be checked by testing the hypothesis 
Ho: y = 0 (against the alternative Ha: y % 0, with the only viable 
part of it being that u > 0) at level of significance a, by using the 
likelihood ratio test. 

(iii) Carry out the test when a = 0.05 and a = 0.10. 


4.6 The diameters of certain cylindrical items produced by a machine are 
r.v's distributed as N(u, 0.01). A sample of size 16 is taken and it is found 
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that x = 2.48 inches. If the desired value for y is 2.5 inches, formulate 
the appropriate testing hypothesis problem and carry out the test if a = 
0.05. 


4.7 A manufacturer claims that packages of certain goods contain 18 ounces. 
In order to check his claim, 100 packages are chosen at random from a 
large lot and it is found that 0% a, = 1,752 and > a? = 31,157. 
Assume that the observations are Normally distributed, and formulate 
the manufacturer's claim as a testing hypothesis problem. Carry out the 


test at level of significance a = 0.01. 


4.8 The breaking powers of certain steel bars produced by processes A and B 
are r.v.'s distributed as Normal with possibly different means but the same 
variance. A random sample of size 25 is taken from bars produced by each 
one of the processes, and it is found that y = 60, sy = 6, Y = 65, Sy = 7. 
Test whether there is a difference between the two processes at the level 
of significance a = 0.05. 


4.9 (i) Let X;,i=1,...,9 and Y;, j = 1,..., 10 be independent r.v.'s from 
the distributions N(j1, 07) and N(ua, 03), respectively. Suppose that 
the observed values of the sample s.d.'s are sy, = 2, sy = 3. At level 
of significance a = 0.05, test the hypothesis Ho: 01 = o2 (against the 
alternative Ha: 01 Æ 02.) 

(ii) Find an expression for the computation of the power of the test for 
0] = 2 and 02 = 3. 





4.10 Refer to Exercise 3.12, and suppose that the variances of and og are 
unknown. Then test the hypothesis Ho: o1 = o2 (against the alternative 
Ha: 01 4 02) at level of significance a = 0.05. 


4.11 The independent random samples X; and Y;, i = 1,..., 5 represent resis- 
tance measurements taken on two test pieces, and the observed values 
(in ohms) are as follows: 


%=0.118, x2=0.125, x3=0.121, m=0.117, 25=0.120, 
y =0.114, y2=0.115, y3=0.119, ys=0.120, ys=0.110. 


Assume that the X;'s and the Y;’s are Normally distributed, and test the 
hypothesis Ho: 01 = o2 (against the alternative Ha: 01 4 02) at level of 
signifince a = 0.05. 


4.12 Refer to Exercise 4.11, and assume now that 0, = 0 = o, say, unknown 
(which is supported by the fact that the hypothesis Ho: o1 = o2 was not 
rejected). Then test the hypothesis Ho: 4; = 2 (against the alternative 
Ha: pa # u2) at level of significance a = 0.05. 


4.13 Consider the independent random samples X;,..., Xm and Yj, ..., Yn 
from the respective distributions N(u1, 0?) and N(u2, 0?) where o is 
known, and suppose we are interested in testing the hypothesis Ao: yı = 
u2 = pu, Say, unknown (against the alternative Ha: yı ~ u2) at level of 
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significance a, by means of the likelihood ratio test. Set x = (X,,..., Um) 
and y= (%, ---, Yn) for the observed values of the X;’s and the Y;’s. 
(i) Form the joint likelihood function L(j1, ua |x, y) of the X;’s and 
the Y¿'s, as well as the likelihood function L(y | x, y). 
(ii) From part (i), conclude immediately that the MLE’s of uı and ua 
are (i; = X and fi2 = y. Also, show that the MLE of yu is given by 


A _ ming 
Ho = min * 


(iii) Show that —2 log à =mn(2— gJ}? /0*(m+n), where à = L(@)/L(Q). 
(iv) From part (iii), conclude that the likelihood ratio test —2 log à > Co 
is equivalent to |% — y| > C(=0 y (m + n)Co/mn). 
(v) Show that C = 2/20,/4+ +1. 
(vi) For any mı and u2, show that the power of the test depends on jz; 
and 2 through their difference mı — u2 = A, say, and is given by 
the formula: 








C-A C+A 
po + 

Y 1 1 1 

oVvmtn mtm 


(vii) Determine the cutoff point when m = 10,n = 15, o = 1, and 
a = 0.05. 
(viii) Determine the power of the test when A = 1 and A = 2. 





T(A)=2-0 


4.14 In reference to Example 15, verify the results: 
Cı 
/ ate dt =1-e 4 -gC0je va, 
0 


00 
| Ofte dt = 9% + a Ce oe, 
C: 


4.15 Verify expression (49) for the power of the test. 


4.16 Verify the assertions made in expressions (52) about the function à = 
Au), u > 0. 


4.17 Verify the assertion made in relation (56) that (411, 12, and oe are the MLE’s 
of u1, u2, and o”, respectively. 


4.18 Show that the expressions in relation (59) are, indeed, the MLE’s of (uw, = 
u2 =)u and o?, respectively. 


Cam 


4.19 Show that à = 1(1%) = (1+ aye is, indeed, strictly increasing in 
i? as asserted right after relation (63). 


4.20 Justify the statement made in relation (65) that t(X, Y) ~ tmn-2- 


4.21 Show that the expressions in relation (68) are, indeed, the MLE’s of u1, a, 
oí, and oz, respectively. 
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4.22 Verify the assertions made in expression (73) about the function à = 
Au), u > 0. 


4.23 Refer to the Bivariate Normal distribution discussed in Chapter 4, Section 
5, whose p.d.f. is given by: 


== 1 -q/2 
Fx, y (x, y) noi pe , XYE R, 

whereq = EY EN], m, a €N, of, 03 > 
0, and —1 < p < 1 are the parameters of the distribution. Also, recall that 
independence between X and Y is equivalent to their being uncorrelated; 
i.e., p = 0. In this exercise, a test is derived for testing the hypothe- 
sis Ho: p = 0 (against the alternative Hy: + 0, the X and Y are not 
independent). The test statistic is based on the likelihood ratio statistic. 
G) On the basis of a random sample of size n from a Bivariate Normal 
distribution, (X;, Y;),1 = 1,...,n, the MLE’s of the parameters 

involved are given by: 


Ay = x, ûz = y, ôf = Sy, de = Sy, P = Suy/y SySy, 


where Sy =D) (m-DP, Sy = 1) (PD, Sy = Dim 
NY — Y), and the x's and y;'s are the observed values of the X;’s 
and Y;,’s. (See Exercise 1.14 (vii) in Chapter 9.) 

(ii) Under the hypothesis Hp: p = 0, the MLES of u1, u2, of, and oF are 
the same as in part (i). 





Hint: It follows immediately, because the joint p.d.f. of the pairs fac- 
torizes to the joint p.d.f. of the X;’s times the joint p.d.f. of the Y;’s. 


(iii) When replacing the parameters by their MLE’s, the likelihood func- 
tion, call it L(x, y), is given by: 
L(x, y) = [27 (SpSy — L) >e”, 
where x= (%, ..., En), Y= (YW, ---, Yn). 
(iv) Under the hypothesis Ho( jp = 0), when the parameters are replaced 
by their MLE’s, the likelihood function, call it Lo(x, y), is given by: 


Lo(x, y) = (218,5, Fe. 


(v) From parts (iii) and (iv), it follows that the likelihood ratio statistic 
à is given by: 


A=(1- pyr p= Szy/ y SySy. 

4.24 (i) By differentiation, show that the function f(r) = (1 — ry” is de- 
creasing in r. Therefore, in reference to Exercise 4.23(v), A < Ap is 
equivalent to ô? > C1, some constant C, (actually, C; = 1 — a Dm; 
equivalently, ð < —C2z or p > Ca (C2 = YC). 

(ii) Since the LR test rejects the hypothesis Hp when A < Ag, part (i) 
states that the LR test is equivalent to rejecting Ho wherever ò < —C2 
or P > Co. 
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(iii) In ô, replace the x;'s and the y;'s by the respective r.v.'s X; and Y;, 
and set R for the resulting r.v. Then, in part (ii), carrying out the 
test based on /, requires knowledge of the cutoff point C2, which 
in turn, presupposes knowledge of the distribution of R (under Hp). 
Although the distribution of R can be determined (see, e.g., Corol- 
lary to Theorem 7, page 474, inthe book A Course in Mathematical 
Statistics, 2nd edition (1997), Academic Press, by G. G. Roussas), it 
is not of any of the known forms, and hence no tables can be used. 

(iv) Set W = W(R) = vR, and show that W is an increasing function 
of R by showing that £ W(r) is positive. 

(v) By parts (ii) and (iv), it follows that the likelihood ratio test is 
equivalent to rejecting Ho whenever W(r) < —C or W(r) > C, 
where C is determined by the requirement that Pm[W (FR) < C or 
W(R) > C] = a (the given level of significance). 

(vi) Under Ab, it can be shown (see, e.g., pages 472-474, in the book cited 
in part (iii) above) that W (R) has the f,_2 distribution. It follows that 
C= ln-2:%- 

To summarize then, for testing Hp: p = 0 at level of significance 
a, reject Hy) whenever W(r) < —th-2;2 Or Wr) > tn-2; 4, where 
Wr) = A, r = P = Syy/ y SrSy; this test is equivalent to the 


likelihood ratio test. 
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More About Testing 
Hypotheses 


In this chapter, a few more topics are discussed on testing hypotheses prob- 
lems. More specifically, LR tests are presented for the Multinomial distribution 
with further applications to contingency tables. A brief section is devoted to 
the so-called (Chi-Square) goodness-of-fit tests, and another also brief section 
discusses the decision-theoretic approach to testing hypotheses. The chapter 
is concluded with a result connecting testing hypotheses and construction of 
confidence regions. 
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THEOREM 1 


It was stated in Section 3 of Chapter 8 that the statistic —2 log à is distributed 
approximately as x 5 with certain degrees of freedom f, provided some regu- 
larity conditions are met. In this section, this result is stated in a more formal 
way, although the required conditions will not be spelled out. 





On the basis of the random sample X;,..., Xn from the p.d.f. f(-; 0), 
OEQCR,r>1, we wish to test the hypothesis Ho: 0 € œ C Q at level 
of significance a and on the basis of the Likelihood Ratio statistic 
A=A(X, ..., Xn). Then, provided certain conditions are met, it holds 
that: 


—2logi = ae for all sufficiently large n and 0 € o; 
more formally, 


Pa(—2logà <x) > GQ), x>0, asn-w, ad) 
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where G is the d.f. of the x2_,, distribution; r is the dimensionality of 
Q, mis the dimensionality of œw, and O e œw. 











The practical use of (1) is that (for sufficiently large n) we can use the 
Xm distribution in order to determine the cutoff point C of the test, which 
rejects Ho when —2logi > C. Specifically, C ~ A Thus, for testing the 
hypothesis Ho at level of significance a, Hp is to be rejected whenever —2 log i. 
is >H m a (always provided n is sufficiently large). 


The Multinomial Case A multinomial experiment, with k possible out- 
comes Oj, ..., Oz and respective unknown probabilities pı, ..., px, is carried 
out independently n times, and let Xj, ..., Xy be the r.v.'s denoting the number 
of times outcomes O;, ..., Oj occur, respectively. Then the joint p.d.f. of the 
X;’s is: 


n y 
F(%,..., %5 0) = aa A (2) 


for X1,..., % > 0 integers with x + --- + % = n, and 0 = (pı, ..., px). The 
parameter space Q is (k — 1)- dimensional and is defined by: 


Q = {(pi,..-, pr) EM; pi > 9, i=1,...,k, Pi + + P= 1}. 


DISCUSSION Suppose we wish to test the hypothesis Ho: pi = Pio, i = 
1,..., k (specified) at level of significance a. Under Ho, 


7 n ‘ 
L(0) = peers Gy “+ Dios 
and we know that the MLE ’s of the p;’s are: pj = +, i= 1, ..., k. Therefore 


A n! y n! ay” uN 
= ¿0 k 
Li: Lk: Ur Tp. NN n 


n , 
Z n"———_ xp ts af 
ay! +++ 4! 


> 
R 

= 
pi 


Therefore 


xy UE 
A= w(22) zz (22) , and Ap is rejected when —2logA > Xeta 
XY Xe 


since here r = k — 1 and m=0. 

Numerical Example The fairness of a die is to be tested on the basis of 
the following outcomes of 30 independent rollings: 7 = 4, x2 = 7, 73 = 3, x4 = 
8, % = 4, % = 4. Take a = 0.05. 
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DISCUSSION Here Fh: pi = E, i= 1,..., 6 and the LR å is given by: 


à = 30% : y l P a ne ES ie E a S ' 
6x4 6x7 6x3 6x8 6x4 6x4 


= 30° x 68% x 4-4 x 77x 33 x 88 x 4-4 x 44, 











It follows that —2logA ~ 3.826, whereas XE 0.06 = 11.071. Thus, the 
hypothesis Hh is not rejected. 





Figure 12.1 

AnrXs r x s Contingency Table 
columns 

Contingency Table 1 2 «+ o... g-1 os 











$F Gi, Pth cell 


















































Application to Contingency Tables Consider a multinomial experiment 
with r x s possible outcomes arranged in a rectangular array with r rows and s 
columns. Such a rectangular array is referred to as an r x s contingency table. 
The r rows and s columns generate r x s cells. (See Figure 12.1.) Denote by 
Pij the probability that an outcome will fall into the (i, 7)th cell. Carry out the 
multinomial experiment under consideration n independent times, and let Xj; 
be the r.v. denoting the number of outcomes falling into the (+, 7)th cell. Define 
pi. and p; by the formulas: 


S r 
WEF Phe talea 272) JS lems (3) 
j=l i=l 


Then, clearly, p; is the probability that an outcome falls in the ¿th row re- 
gardless of column, and p.; is the probability that an outcome falls in the jth col- 
umn regardless of row. Of course, );_, Di. = Si p= ar py = 1. 
Also, define the r.v.’s X; and X ; as follows: 


S T 
Xi = J Xy, i=1,...,f, A Xy, j=l,...,s. (4) 
j=l i=1 


Thus, clearly, X;, denotes the number of outcomes falling in the ith row 
regardless of column, and X ; denotes the number of outcomes falling in the 
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jth column regardless of row. It is also clear, that 


x= yA =D yx =n. 


i=1 j= 


The parameters py, i = 1,...,r, j =1,...,8 are, in practice, unknown 
and are estimated by the MLE's Î;; = u, In the testing hypotheses framework, 
one could test the hypothesis Ho: Pi; = Pijo, t= 1,..., 7, j = 1, ..., S, speci- 
fied. However, from a practical viewpoint, this is not an interesting hypothesis. 
What is of true interest here is to test independence of rows and columns. In 
order to provide some motivation, suppose that some subjects (e.g., human 
beings) are classified according to two characteristics to be denoted by A and 
B (e.g., human beings are classified according to gender, characteristic A, and 
whether or not they are cigarette smokers, characteristic B). Suppose that 
characteristic A has r levels and characteristic B has s levels. (In the concrete 
example at hand, r = 2 (Male, Female) and s = 2 (Smoker, Nonsmoker).) We 
agree to have the r rows in an r x s contingency table represent the r levels 
of characteristic A and the s columns of the contingency table represent the s 
levels of characteristic B. Then independence of rows and columns, as men- 
tioned earlier, is restated as independence of characteristics A and B or, more 
precisely, independence of the r levels of characteristic A and the s levels of 
characteristic B. (In the concrete example this would mean that gender and 
smoking/nonsmoking are independent events.) The probabilistic formulation 
of the independence stated is as follows: 

Observe that P(A¡NB;) = py, P(Ai) = pi, and P(B;) = p.j. Independence 
of A; and B; for all ¿ and j means then that 


P(A; N B;) = P(A;)P(B;), all ¿and J, Or Pij = Pi. P.j, all ¿ and J. 

To put it differently, we wish to test the hypothesis that there exist (prob- 
abilities) p; > 0, ¿=1,..., Pp t+---+p, = l andq; > 0, j=1,...,s, 
qı +---+4s = 1, such that 

Ho: Pij = Didj; =p d J= aS (5) 


(Of course, then p; = pi, and q; = p.j, all i and j.) The MLE of pij is py = 
m t=1,...,7% j= 1, ...,s. Therefore, writing [ [, ; for J [,_, [Tj andsetting 


O for (pj, t= 1,...,7%, j= 1, ..., S), we have, for the likelihood function 
Lay i= 1, on j=1,...,S)= Des (6) 
ij mo a ij 
and 





A n! Tij Tij n! n 
Love ( ) € a m 
Tagy! y v mT], 5 £i! I] ij 
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Under a = likelihood function Ao 


(ao = = pq 
a a dd Tes j Vij! 1111 a 


ij 


nl Pil q% = Lj 
-iaee epil) © 


2 


because 
[ era ca = (pra qa) (pq) 
i 


11 


= (pP pa --- ay") -- (age --- ae") 


= (Tor Jar e) = (Fat) (Me) 


The MLE’s of p; and q; are given by 





a=, L= lios Y; aj=*, PELS; (9) 
n n 
so that 
n! a ye aye n! z ai 
O O N] 
TT, ¡ ty! I] n I] n nen T; ¡ig! I] í I] d 
(10) 
By (7) and (10), we have then 
ae La 
_ UL), a 
n” Mii i 
and 
Y S Y S 
—2logrA= > oa ry y Xij oa = (2 Xx; log X;, +) Xj losas) | 
i=l j=l i=l j=l 
(12) 
Here the dimension of Q is rs — 1 because we have rs pij, i= 1,...,% j = 


1,..., 8, which, however, satisfy the relationship >, »;-1 Py = 1. In or- 
der to determine the dimension of w, observe that we have r + s parameters 
pi, i= 1,...,r andq;, j = 1,...,8, which, however, satisfy two relation- 
ships; namely, )7;_, Pi = 1 and )_, qj = 1. Therefore the dimension of w is 
r+s—2and 


dim 2 — dim w = (rs — 1) — (r + s — 2) = (r — 1)(s — 1). 





Furthermore, it so happens that the (unspecified) conditions of Theorem 1 
are satisfied here, so that, under Hp, —2 log 4 is distributed approximately (for 
all sufficiently large n) as Xé- It follows that the hypothesis (5) about 
independence is rejected, at level of significance a, whenever 


—2logi > eae (13) 
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Numerical Example A population consisting of n= 100 males (M) and 
females (F) is classified according to their smoking (S) or nonsmoking (NS) 
cigarettes habit. Suppose the resulting 2 x 2 contingency table is as given below. 
Then test independence of gender and smoking/nonsmoking habit at the level 
of significance a = 0.05. 


























| S | NS 

| M | 20 | 35 | 55 
| F [ 15 | 30 | 45 
l 35 | 65 | 100 





DISCUSSION The values xvi; are shown in the cells and the 2, xj; are 
shown in the margins, and they are: 311 = 20, x12 = 35, x21 = 15, x22 = 30, 11 = 55, 
x2, = 45, xı = 35, x2 = 65. Replacing these values in the expression of —2 log À 
given by (12), we find —2 log à = 0.061. Here r= s =2, so that rta = 
x? 0.05 = 3-841. Therefore the hypothesis is not rejected. 











1.1 (i) In reference to Example 18 in Chapter 1, the appropriate probability 
model is the Multinomial distribution with parameters n and Pa, Pp, 
PAB, Po, Where pa through po are the probabilities that an individual, 
chosen at random from among the n persons has blood type either A 
or B or AB or O, respectively. 
(ii) Let pao, Po, Paño, and Poo be a priori stipulated numbers. Then, check- 
ing agreement of the actual probabilities with the stipulated values 
amounts to testing the hypothesis 


Ho: Pa = Pao, PB= Po PAB = Paño Po = Poo. 


(iii) The hypothesis Hp is tested by means of either the log-LR test (see 
Example 1 here) or the x? goodness-of-fit test. (See also Exercise 2.1.) 


1.2 (i) In reference to Example 19 in Chapter 1, the appropriate probability 
model is the Multinomial distribution with parameters n = 41,208 and 
pi, i= 1, ..., 12, where p; = P(a birth chosen at random from among 
the n births falls in the ¿th month). 

(ii) Checking uniform distribution of the n births over the 12 months 
amounts to testing the hypothesis 
1 
Ho: pi = Pio = 2 t= lr TZ: 
(iii) The hypothesis Hp is tested by means of either the log-LR test (see 
Example 1 here) or the x? goodness-of-fit test. (See also Exercise 2.2.) 
The hypothesis Hp is rejected when —21l0gA > x{j.,,- 
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1.3 (i) In reference to Example 20 in Chapter 1, the appropriate probability 
model is the 2 x 3 contingency table setup in the example. 
Gi) If py is the probability that a randomly chosen subject from among the 
150 falls into the (i, 7)th cell, then independence between the factors 
health and diet is checked by testing the hypothesis 


Ho: Pij = Pid), i= 1, 2 and J = 1, 2, 3. 


An appropriate test statistic for testing the hypothesis Ho is either the 
log-LR test or the x? goodness-of-fit-statistic. (See also Exercise 2.3.) 


1.4 (i) In reference to Example 21 in Chapter 1, the appropriate probability 
model is the 3 x 4 contingency table setup in the example. 

(ii) If p; is the probability that a randomly chosen subject from among the 
200 falls into the (+, 7)th cell, then checking the stipulation that change 
of bone minerals does not vary for different groups amounts to testing 
the hypothesis 


Ho: py = piqj, t= 1,2,3 and j = 1, 2,3. 


The hypothesis Hp may be checked by means of either the log-LR 
test or the x? goodness-of-fit test. (See also Exercise 2.4.) 


1.5 In reference to Example 1 of Chapter 1, the n landfills are classified ac- 
cording to two levels of concentration (High and Low) and three levels 
of hazardous chemicals (Arsenic, Barium, and Mercury) to produce the 
following 2 x 3 contingency table: 


HAZARDOUS CHEMICALS 
Arsenic Barium Mercury Totals 


Level of High X1 X12 X13 Uy. 
Concentration Low 221 2092 223 X2. 
Totals wy £2 £3 “=n 


Then, if p;; is the probability that a landfill chosen at random from 
among the n landfills falls into the (i, 7)th cell, part (ii) of the example 
becomes that of testing the hypothesis Ho: Pij = Pijo, where Pijo, 1 = 1, 2 
and j = 1, 2, 3 are a priori stipulated numbers. The hypothesis Hp is tested 
by means of either the log-LR test or the x? goodness-of-fit test. (See also 
Exercise 2.5.) 


| 12.2 A Goodness-of-Fit Test 


This test applies primarily to the Multinomial distribution, although other dis- 
tributions can also be suitably reduced to a multinomial framework. In the 
notation of the previous section, we have that, for each fixed i=1,...,k, 
X;~ BY, pi), so that EX; = np;, i = 1,...,k,0 = (pi, ..., pe). Thus, the 
ith outcome would be expected to appear np; times, whereas the actual num- 
ber of times it appears is X;. It then makes sense to compare what we expect 
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and what we, actually, observe, and do this simultaneously for alli = 1, ..., k. 
One way of doing this is to look at the quantity ES ¡(A — mp;)*. Small values 
of this quantity would indicate agreement between expected and observed 
values, and large values would indicate the opposite. For distributional rea- 
sons, the above expression is modified as indicated below, and in this form it 
is denoted by x7; namely, 


k 
_ a (X; oe as) 


l= 


Expression (14) is the basis for constructing test statistics for testing var- 
ious hypotheses. In this setting, we will consider the hypothesis Ho: pi = Pio, 
i = 1,...,k, specified as we did in the previous section. Under Ho, (14) is 
denoted by x? and is equal to: 


=D (Xi — mio) (15) 


NPi0 


This is a statistic and is used for testing Hj. Accordingly, Ho is rejected, 
at level of significance a, if x > C, where C is determined by the require- 
ment Pm (xé > C) = a. It can be seen that, under Ho, x2 ~ xj, for all 
sufficiently large n. Consequently, C ~ Xka The test used here is called a 
test of goodness-of-fit for obvious reasons. It is also referred to as chi-square 
(or x?) goodness-of-fit test, because of the symbol used in relation (15), and 
because its asymptotic distribution (under the null hypothesis) is chi-square 
with certain degrees of freedom. Thus, the (Chi-Square) goodness-of-fit test 
rejects Hy whenever x2 > Xj_1,9: 

For illustrative and also comparison purposes, let us consider the first 
numerical example in the previous section. 

Numerical Example Here npio = --- = npe = 2 = 5, and then the 
observed value of x? is: 





x2 = AA 7-5) + B-5¥ + B-5) +4 -5¥ A = 


For a = 0.05, Xina = X5.005 = 11.071, and since x3 = 4 < 11.071, the 
hypothesis Hp is not rejected, as was also the case with the LR test. 
In the framework of a contingency table, expression (14) becomes 


= 3 Do (Xy = Piy) moy (16) 


i=l j= ND ij 


Under the hypothesis of independence stated in (5), expression (16) takes the 
form 


=y > (Xy — rupias Y l (17) 


i=l j= 1 NPidj 
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From (17), we form the test statistic x2 defined below: 


ros Mm. Ay 
(Xy — NPid y) 
x= ARA (18) 
i=l j=l NPidj 
where P;, i=1,...,randq;, J = 1, ..., s are given in (9). Once again, it may 


be seen that, under Ho, Xx Š XDE- for all sufficiently large n. Thus, the 
hypothesis Ah is rejected, at level of significance w, whenever x? > xê- DE-D;a* 
The contingency table numerical example of the previous section is as below. 

Numerical Example Here pi = 0.55, Daz = 0.45, Gi = 0.35, Go = 0.65, 
so that 


NPQ, = 19.25, np = 35.75, npg; = 15.75, n®zĝ2 = 29.25. 
DISCUSSION Therefore 


2 @0- 19.25) (85 — 35.75)? (15 -— 15.75) (80 — 29.25)? 


a ~ 0.0998. 
Xo 19.25 35.75 15.75 29.25 ibi 





Since X11), = x 0.05 = 3.841, the hypothesis Hp is not rejected, as was 
also the case with the LR test. 


2.1 Same as Exercise 1.1. 
2.2 Same as Exercise 1.2. 
2.3 Same as Exercise 1.3. 
2.4 Same as Exercise 1.4. 
2.5 Same as Exercise 1.5. 


2.6 A coin, with probability p of falling heads, is tossed independently 100 
times, and 60 heads are observed. 

(i) Test the hypothesis Hp: p = 1/2 (against the alternative Ha: p # 1/2) 
at level of significance a = 0.1, by using the appropriate x? goodness- 
of-fit test. 

(ii) Determine the P-value of the test (use linear interpolation). 


2.7 Adie is cast independently 600 times, and the numbers 1 through 6 appear 
with the frequencies recorded below. 


1 2 3 4 5 6 
100 | 94 | 103 | 89 | 110 | 104 





























Use the appropriate x? goodness-of-fit test to test fairness for the die 
at level of significance a = 0.1. 
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2.8 Ina certain genetic experiment, two different varieties of a certain species 
are crossed and a specific characteristic of the offspring can occur at 
only three levels A, B, and C, say. According to a proposed model, the 
probabilities for A, B, and C are > > and > respectively. Out of 60 

offspring, 6, 18, and 36 fall into levels A, B, and C, respectively. Test the 

validity of the proposed model at the level of significance a = 0.05. Use 
the appropriate x? goodness-of-fit test. 


2.9 Course work grades are often assumed to be Normally distributed. In a 
certain class, suppose that letter grades are given in the following man- 
ner: A for grades in the range from 90 to 100 inclusive, B for grades in the 
range from 75 to 89 inclusive, C for grades in the range from 60 to 74 inclu- 
sive, D for grades in the range from 50 to 59 inclusive, and F for grades in 
the range from 0 to 49. Use the data given below to check the assumption 
that the data are coming from an N(75, 9”) distribution. For this purpose, 
employ the appropriate x? goodness-of-fit test, and take a = 0.05. 





A|B|C|D|F 
3 | 12 | 10] 4 1 


























Hint: Assuming that the grade of a student chosen at random is a r.v. 
X ~ N(7%, 81), compute the probabilities of an A, B, C, D, and F. Then 
use these probabilities in applying the x? goodness-of-fit test. 


2.10 It is often assumed that the I.Q. scores of human beings are Normally 
distributed. On the basis of the following data, test this claim at level of 
significance a = 0.05 by using the appropriate x? goodness-of-fit test. 
Specifically, if X is the r.v. denoting the I.Q. score of an individual chosen 
at random, then: 

(i) Set pı = P(X < 90), po = P(90 < X < 100), pz = P(100 < X < 
110), pa = P(110 < X < 120), ps = PAZO < X < 130), pe = 
P(X > 130). 

(ii) Calculate the probabilities p;, i = 1, ..., 6 under the assumption that 
X ~ N(100, 15%) and call them pjo,i = 1,..., 6. Then set up the 
hypothesis Ho: p; = Pio, i= 1,..., 6. 

(ii) Use the appropriate x? goodness-of-fit test to test the hypothesis at 
level of significance a = 0.05. 

The available data are given below, where x denotes the observed 
number of individuals lying in a given interval. 





x<90 | 90<x%<100 | 100<x<110 | 110<x<120 | 120<x<130 | > 130 
10 18 23 22 18 9 


























2.11 Consider a group of 100 people living and working under very similar 
conditions. Half of them are given a preventive shot against a certain 
disease and the other half serve as controls. Of those who received the 
treatment, 40 did not contract the disease whereas the remaining 10 did 
so. Of those not treated, 30 did contract the disease and the remaining 
20 did not. Test effectiveness of the vaccine at the level of significance 
a = 0.05, by using the appropriate x? goodness-of-fit test. 
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Hint: For an individual chosen at random from the target population 
of 100 individuals, denote by Ti, Ta and D,, Dz the following events: 
Tı = “treated,” Ta = “not treated,” Dı = “diseased,” Də = “not dis- 
eased,” and set up the appropriate 2 x 2 contingency table. 


2.12 On the basis of the following scores, appropriately taken, test whether 
there are gender-associated differences in mathematical ability (as is of- 
ten claimed!). Take a = 0.05, and use the appropriate x? goodness-of-fit 
test. 


Boys: 80 96 98 87 7 83 70 92 97 82 
Girls: 82 90 84 70 80 97 76 90 88 86 


Hint: Group the grades into the following intervals: [70, 75), [75, 80), 
[80, 85), [85, 90), [90, 95), [95, 100), and count the grades of boys and 
girls falling into each one of these intervals. Then form a 2 x 6 conti- 
gency table with rows the two levels of gender (Boy, Girl), and columns 
the six levels of grades. Finally, with p;; standing for the probability 
that an individual, chosen at random from the target population, falls 
into the (i, Hth cell, stipulate the hypothesis Ho: pi; = Ppiqj, i = 1, 2 
and j = 1,..., 6, and proceed to test it as suggested. 


2.13 From each of four political wards of a city with approximately the same 
number of voters, 100 voters were chosen at random and their opinions 
were asked regarding a certain legislative proposal. On the basis of the 
data given below, test whether the fractions of voters favoring the legisla- 
tive proposal under consideration differ in the four wards. Take a = 0.05, 
and use the appropriate x? goodness-of-fit test. 


WARD 
1 2 3 4 Totals 


Favor proposal 37 29 32 21 119 


Do not favor proposal 63 71 68 79 281 
Totals 100 100 100 100 400 


| 12.3 Decision-Theoretic Approach to Testing Hypotheses 


There are chapters and books written on this subject. What we plan to do in 
this section is to deal with the simplest possible case of a testing hypothesis 
problem in order to illustrate the underlying concepts. 

To this end, let X,,..., Xn be a random sample with an unknown p.d.f. f. We 
adopt the (somewhat unrealistic) position that f can be one of two possible 


specified p.d.f.’s, fo or fı. On the basis of the observed values 2, ..., Xy of 
X1,..., Xn, we are invited to decide which is the true p.d.f. This decision will 
be made on the basis of a (nonrandomized) decision function ô = $(X1,..., Xn) 


defined on R” into KR. More specifically, let R be a subset of R”, and suppose 
that if x = (%,..., £n) lies in R, we decide that fı is the true p.d.f., and if x 
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lies in R° (the complement of R with respect to R”), we decide in favor of 
Jo. In terms of a decision function, we reach the same conclusion by taking 
6(x) = Ip(x) (the indicator function of R) and deciding in favor of fı if$(x) = 1 
and in favor of fo if 8(x) = 0. Or 


1 (which happens when x e R) leads to selection 
5(x) = of fi, and hence rejection of fo, (19) 


0 (which happens when x e R°) leads to selection 
of fo, and hence rejection of fi. 

At this point, we introduce monetary penalties for making wrong decisions, 
which are expressed in terms of a loss function. Specifically, let LC f; $) be a 
function in two arguments, the p.d.f. f and the decision function $ = ó(x). 
Then it makes sense to define L(f, 5) in the following way. 

0 if f = fo and d(x) = Oor f = fiand ôx) = 1, 
Lf; 6) = į Lı if f= fo and d(x) = 1, (20) 
Lo if f = fi and d(x) = 0, 
where L; and La are positive quantities. 

Next, consider the average (expected) loss when the decision function 
$ is used, which is denoted by R(f; $) and is called the risk function. In 
order to find the expression of R(f; 5), let us suppose that Ph(X € R) = 
P,,[5(X) = 1] = a and Py, (X e R) = Py, [5(X) = 1] = x. Then a is the proba- 
bility of deciding in favor of fi if, actually, fo is true, and z is the probability 
of deciding in favor of fı when fi is, actually, true. Then: 

Lı P(X € R) = L¡Pp[6(X ) = 1] = Lia, if f = fo 
LaPa (X € R°) = LPR [6X )=0 = LA =- x), if f= fi, 


(21) 


Rf; 8) = 


or, 


R(Jo; 8) = LiPp(X € R) = Lia, 
(22) 
R(fi; 8) = LaP¡(X € R°) = L20. — 71). 


Let us recall that our purpose is to construct an optimal decision func- 
tion ô= ô(x), where optimality is defined below on the basis of two differ- 
ent criteria. From relation (22), we know which is the bigger among the risk 
values R( fo; 6) = Lia and R( fi; 6) = La(1 — 71). That is, we have the quantity 
max{R(fo; 5), RC fi; 8)). For any other (nonrandomized) decision function ô* 
the corresponding quantity is max{R(fo; 8%), R(fi; 6*)}. Then it makes sense 
to choose ô so that 


max{R(fo; 5), RCfi; 5)} < max{ Ro; 6"), ROA; 8%) (23) 


for any other decision function 5* as described above. A decision function ô, if it 
exists, which satisfies inequality (23) is called minimax (since it minimizes the 
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maximum risk). The result below, Theorem 2, provides conditions under which 
the decision function ô defined by (19) is, actually, minimax. The problem is 
stated as a testing problem of a simple hypothesis against a simple alternative. 





Let X¡,..., Xn be a random sample with p.d.f. f which is either fo or fi, 
both completely specified. For testing the hypothesis Hp: f = fo against 
the alternative H4: f = fı at level of significance a, define the rejection 
region R by: 

R= ((%, ..., En) E R” fi): SEn) > Chola) >>> fon}, 


and let the test function ô = $(x) (x = (%1,..., Xn)) be defined by (19); 


es 
l ifxeR, 
d(x) = 
0 ifxeR'. 


The constant C is defined by the requirement that E¡¿ó(X) = 
Py (X € R) = 0 (X = (Xı, ..., Xn)), and it is assumed that the level 
of significance a, the power z of the test ô, and the quantities L; and Lz 
satisfy the relationship 


(Ro; =) Lia = Lal- x) (=R(fi; 8). (24) 


Then the decision function ô = $(x) is minimax. 








REMARK 1 In connection with relation (24), observe that, if we determine 
the level of significance «, then the power 7 is also determined, and therefore 
relation (24) simply specifies a relationship between the losses Lı and Lo; 
they cannot be determined independently but rather one will be a function 
of the other. In the present context, however, we wish to have the option of 
specifying the losses L; and Lz, and then see what is a possible determination 
of the constant C, which will produce a test of level of significance a (and of 
power 7) satisfying relation (24). 


PROOF OF THEOREM 2 For simplicity, let us write P and P; instead of Ph 
and Py, respectively, and likewise, R(0; $) and R(1; $) instead of RC fo; $) and 
R( fi; ô), respectively. Then assumption (24) is rewritten thus: R(0; $) = Lya = 
La2(1 — x) = RC; $). Recall that we are considering only nonrandomized de- 
cision functions. With this in mind, let T be any (other than F) subset of R”, 
and let 5* be its indicator function, ô*(x) = Ir(x), so that 5* is the decision 
function associated with T. Then, in analogy with (22), 


R(0;8)=L¡P(XeT) R; 89) = LoP\(X€ T°). (25) 


Look at R(0; ô) and R(0; 5*) and suppose that R(0; 8*) < R(0; ô). This is 
equivalent to Ly Po(X € T) < L¡Po(X € R) = Lia, or Po(X € T) < a. So ó*, 
being looked upon as a test, is of level of significance < «œ. Then by Theorem 1 
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in Chapter 11, the power of the test ô*, which is P,(X e T), is less than or 
equal to P,(X e R), which is the power of the test ô. This is so because 6 is of 
level of significance a, MP among all tests of level of significance < a. From 
P(X € T) < P(X e R) we have, equivalently, P(X € T°) > P(X e R*°) or 
LoP(X € T°) > Lo9P (X € R°) or R(1;8%) > RA; 5). To summarize, the 
assumption R(0; 3%) < R(0; 5) leads to R(1; 5) < RC; 8%). Hence 


R(0; 5%) < RO; 8) = RG; 5) (by (24)) < RA; 5*), 
and therefore 
max(R(0; 8%), RA; 8%) = RA; 8%) > R(1; 8) = max{ RO; 8), RG; 8), (26) 
as desired. Next, the assumption, 
R(0; 5) < RO; 8*) (27) 
leads likewise to the inequalities 
RC; 8%) < RG; 8) = RO; 5) (by (24)) < RCO; 89), (28) 
so that 
max(R(0; 8%, RA; 6*)} = RO; 6*) > RO; 8) = max(R(0; 6), RAS). = (29) 
Relations (26) and (29) yield 
max(R(0; 8), R(; 8)) < max(R(0; 8%), R(1; 85), 
so that ô is, indeed, minimax. A 
REMARK 2 Itis to be pointed out that the minimax decision function 6 = 


ô(x) above is the MP test of level of significance P(X € R) for testing the 
(simple) hypothesis Ho: f = fo against the (simple) alternative Ha: f = fi. 


REMARK 3 If the underlying p.d.f. f depends on a parameter 0 e Q, then 
the two possible options fo and fı for f will correspond to two values of the 
parameter 0, 00, and 6}, say. 


The theorem of this section is illustrated now by two examples. 

On the basis of the random sample X;,..., Xn from the N(@, 1) distribution, 
determine the minimax decision function 6 = ô(x) for testing the hypothesis 
Ho: 0 = % against the alternative Ha: 0 = 64. 

DISCUSSION Here the joint p.d.f. of the X;’s is 

1 n 
9) — —n/2 a 
L(x; 0) = 2x)" exp |- dm 0) | 


so that the rejection region Ris defined by L(x; 01) > C L(x; 67) or, equivalently, 
by 


exp[n(0, — 6)4#] > C exp [n(0; — 65)], 
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or 
> Co for 0i > 00, and < Co for 01 < 0o, 
log C 
n(0 — 00) 
Then the requirement in (24) becomes, accordingly, 


Lı Pa (X > Co) = LP, (X < Co) for 0, > 00, 


i (30) 
where Co = 0 +00)+ 


and 
L¡Pa(X < Co) = L2Pa, (X > Co) for 01 < 0, 
or 
Li(1— ®[./n(Co — 60)]} = La 9[VN(Co — 61)] for 61 > bo, 
and (31) 
L¡9[YN(Co — 00)] = La[1 — &[./n(Co — 01)]} for 0, < 0o. 


Consider the following numerical application. 
Numerical Example Suppose n= 25 and let 6) = 0 and 6; = 1. In the spirit 
of Remark 1, take, e.g., Lı =5 and Lz =2.5. 


DISCUSSION Then the first relation in (81), which is applicable here, 
becomes 


@[5(Cy — D] = 2[1 — ©(5Cy)] or 285C.) — #5 — 5Cy) = 1. 


From the Normal tables, we find Co = 0.53, so that the minimax decision 
function is given by: 


5(x)=1 if @>0.53, and $(x)=0 if #<0.53. 


Let us now calculate the level of significance and the power of this test. We 
have 


PX > 0.53) = 1 — (5 x 0.53) = 1 — 9(2.65) = 1 — 0.995975 ~ 0.004, 
and 
(1) = P(X > 0.53) = 1 — 0[5(0.53 — 1)] = 9(2.35) = 0.990613 ~ 0.991. 
In terms of the random sample Xj, ..., Xn from the B(1, 0) distribution, de- 


termine the minimax function $ = ô(x) for testing the hypothesis Hp: 6 = 6 
against the alternative Ha: 0 = 0). 


DISCUSSION The joint p.d.f. of the X;’s is here 
L(x; 0) =a Oy, t=0+-::+%, 
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so that the rejection region Ris determined by 
Lœ; 01) > CL(x; 00) or [01(1—00)/0( — 001" > C[A — %)/A - 0)”, 
or 


A — 00)01 ; 1-0, 
t log ————_ > Cp = log C — nl : 
og aol — 61) > Ch = log nlog 14 





This is equivalent to 


t> Co for 6; > Oo, and t< Co for 01 < 0o, 





where Co = C¿/log A. The requirement in (24) becomes here, respec- 
tively, 
LiPa (X > Co) = LaPa (X < Co) for 01 > 4, 
and 
LiPa (X < Co) = LaPa (X > Co) for 01 < 4, 
or 


Ly P(X < Co) + L2Pa (X < Co) = Lı for 6; > 6o, 
and (32) 
Ly P(X < Co — 1) + LaPa (X < Co — 1) = Lo for < 0, 
where X ~ B(n, 0). 


Numerical Example Let n = 20, and suppose 07 = 0.50 and 6; = 0.75. 


DISCUSSION Here, the first relation in (32) is applicable. Since 
Po. (X < Co) = Po. (X > 20 — Co) = 1 — Pos(X < 19 — Co), (33) 
the first relation in (32) becomes 
LiPoso(X < Co) — LaPos(X < 19 — Co) = Li — La, 
or 


La = [1 — Poso(X < Co)|Li/[1 — Poss(X < 19 — Co). (34) 


At this point, let us take Lı = 1 and Lə = 0.269. Then the right-hand side of 
(34) gives, for Co = 13; ee = oe x 0.269 = La; i.e., the first relation in 
(32) obtains. The minimax decision function ô = 6(x) is then given by: $(x) = 1 
if x > 14, and ô(x) = 0 for x < 13. The level of significance and the power of 
this test are: 


Py X = 14) = 1 — P so(X < 13) = 1 = 0.9423 = 0.0577, 





and, on account of (33), 


x (0.75) = Pors(X > 14) = Poos(X < 6) = 0.7858. 


THEOREM 3 


12.3 Decision-Theoretic Approach to Testing Hypotheses 359 


Instead of attempting to select ô= ô(x) so as to minimize the maximum 
risk, we may, instead, try to determine ô so that 5 minimizes the average risk. 
This approach calls for choosing the p.d.f.’s fo and fı according to a probability 
distribution; choose fo with probability pp and choose fı with probability pı 
(po + pi = 1), and set Ao = (po, pi}. If R,,($) denotes the corresponding 
average risk, then, on account of (22), this average is given by: 


Ra (ô) = L1P¿(X € R)po + L2P¡ (X € R°)pi 
= poli P,(X € R) + pi Le[1 — P(X e B)] 
= pi Le + [poL1P, (X € R) — piLoP, (X e K) 
_ ee + Spl Pola Joa): > fon) — pile fila): ++ fin) day +++ day, 
Pilo + Y yerl Poli f(x): > Sn) — Pile 4 a) > fi@n)] 


for the continuous and the discrete case, respectively. From this last expres- 
sion, it follows that R,,,($) is minimized, if poL1 fo(m)--- fla.) — Pile f(x): >: 
f(x.) is < 0 on R. But 3(x) = 1 on Rand 3(x) = 0 on R°. Thus, we may restate 
these equations as follows: 





1 if AG) At) > PEA), 


35 
0 otherwise. (38) 


d(x) = | 


Thus, given a probability distribution A) = {po, pı} on { fo, fi}, there is always a 
(nonrandomized) decision function 6 which minimizes the average risk R,, (6), 
and this 6 is given by (35) and is called a Bayes decision function. 





The Bayes decision function ó,, (x) corresponding to the probability dis- 
tribution Ao = (po, pi) on { fo, fi} is given by (35). This decision function 
is, actually, the MP test for testing the hypothesis Ho: f = fo against the 
alternative H4: f = fı with cutoff point C = poL1/p,L2 and level of 
significance « given by: 


Pp LAA) ++ Xp) > CIAL) >>> Sol An )] = a. (36) 








REMARK 4 As mentioned earlier, if the underlying p.d.f. depends on a 
parameter 0 e Q, then the above problem becomes that of testing Ap: 0 = Op 
against Ha: 0 = 0, for some specified 09 and 0; in Q. 


DISCUSSION In reference to Example 2 and for the case that 0, > 6, 


Sox) = Lif Z > Co, Co = 5(01 + 00) + a a C = pol /p Le, as follows 
from relation (30). For the numerical data of the same example, we obtain 
Co = 0.50 + 0.04 log ph. For example, for po = 7 Co is ~ 0.50 + 0.04 x 0.693 = 
0.52772 ~ 0.53, whereas for po = i Co is ~ 0.50 — 0.04 x 0.405 = 0.4838 ~ 


0.48. For Co = 0.53, the level of significance and the power have already 
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been calculated. For Co = 0.48, these quantities are, respectively: 


PAX > 0.48) = 1 — d(5 x 0.48) = 1 — 0(2.4)= 1 — 0.991802 = 0.008198, 


(1) = P(X > 0.48) = 1 — &[5(0.48 — 1)] = 9(2.6) = 0.995339. 


In reference to Example 3 and for the case that 6; > 6, ôx (x) = lif x > Co, 
Co = og C — nlog Ay log eae C = pol; /p) Lz. For the numerical data 


of the same example, we have Co ~ (15.173 + log Tm / 1.099. For p = Z, Co is 
13.81, and for po = H, Co is 12.81. In the former case, ô, (x) = 1 for x > 14, and 
in the latter case, ô\, (x) = 0 for x < 13. The level of significance and the power 
have been calculated for the former case. As for the latter case, we have: 


Poso(X = 13) = 1 — P)so(X < 12) = 1 — 0.8684 = 0.1316, 








z (0.75) = Pors(X > 13) = Poss(X < 7) = 0.8982. 


| 12.4 Relationship Between Testing Hypotheses and Confidence Regions 


In this brief section, we discuss a relationship which connects a testing hy- 
pothesis problem and the problem of constructing a confidence region for the 
underlying parameter. To this effect, suppose Xj, ..., Xn is a random sample 
from the p.d.f. f(;0) 0 e QC R”, r > 1, and for each @ in Q, consider 
the problem of testing the hypothesis, to be denoted by Hp (0), that the pa- 
rameter 0*, say, in Q, is actually, equal to the value of 6 considered. That is, 
(0): 0* = @ at level of significance a. Denote by A(@) the respective accep- 
tance region in R”. As usually, X = (X1,..., Xn) and x = (%,..., £n) is the 
observed value of X. For each x e R”, define in Q the region T(x) as follows: 


T(x) = {0 € Q; xe A(O). (87) 


Thus, T(x) consists of all those O € Q for which, on the basis of the out- 
come x, the hypothesis Ho(@) is accepted. On the basis of the definition of T(x) 
by (87), it is clear that 


0 € T(x) if and only if x e A(@). 





Therefore 
Pol[0 € T(X )] = Po[X € A(O)]. (88) 
But the probability on the right-hand side of (88) is equal to 1 — a, since 
the hypothesis Ho(@) being tested is of level of significance a. Thus, 
P,[0 € T(X)=1-a0, 
and this means that the region T(X ) is a confidence region for 0 with confi- 
dence coefficient 1 — a. 


Summarizing what has been discussed so far in the form of a theorem, we 
have the following result. 


THEOREM 4 


EXAMPLES 
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Let Xj, ..., Xn be a random sample from the p.d.f. f(-; 0), 0E QCR”, 
r > 1, and, for each O e Q, consider the problem of testing the hypothesis 
(0): 0* = 0 at level of significance a. Let A(@) be the corresponding 
acceptance region in K”, and for each x e R”, define the region T(x) 
in Q as in (37). Then T(X) is a confidence region for 0 with confidence 
coefficient 1 — a, where X = (Xj,..., Xn) and x = (%,..., £n) is the 
observed value of X. 








This result will now be illustrated below by two examples. 


On the basis of a random sample X,, ..., Xn fromthe N(0, 0?) distribution with 
o known, construct a confidence interval for 9 with confidence coefficient 1—a, 
by utilizing Theorem 4. 


DISCUSSION For each 9 € Q = % and for testing the hypothesis Ho(@) 
that the (unknown) parameter 0*, say, is, actually, equal to 0, it makes sense 
to reject Ho(@) when X is either too far to the left or too far to the right of 0. 
Equivalently, if X — 6 is either <C; or X — 6 is >C2 for some constants C1, C2. If 
H(6) is to be of level of significance a, we will have P(X — 0 < Ci or X —0 > 
C2) = a. But under Ap(0), the distribution of X is symmetric about 6, so that it 
is reasonable to take Cı = —C2, and then C2 = 24/2, me gee a Thus, Ho(@) is 











accepted whenever — 2/2 € X— 0 < Zaj2 OF —Za/2% Ta Za /2 Ja , and, 
of course, 
X-—0 
r| aa aa Asa] =1 O. 
O oO O 
Thus, 
n Ux —0 n 
A) = {x € ee < v ) < sl, 
Oo Oo Oo 


and therefore, by (37), 


T(x) = {90 ER; xe A(0)) 





0 
= jo € R; we MG ) = sap] 
O O O 
= fo ew z- z <O<T+2 <I. 
“ne = “2 Tn 





In other words, we ended up with the familiar confidence interval for 0, X + 


Zy/2 To we have already constructed in Chapter 10, Example 1(i). 
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Let the random sample Xj, ..., X, be from the N(p, 0?) distribution, where 
both u and o? are unknown. Construct a confidence interval for o? with con- 
fidence coefficient 1 — a, again using Theorem 4. 


DISCUSSION Here S? = + W(X; — XY is an estimate of o”, and, 
therefore, for testing the hypothesis Ho(o”): vario = 0”, it is reasonable to 
reject Hy(o”) whenever the ratio of S? over the o? specified by the hypothesis is 
either too small or too large. That is, reject Hy(o”) when 5 < C¡ or S > C2 for 
some (>0) constants C1, C2 to be specified by the requirement that Pa E < Ci 
or s > C2) = &, or 


m-DS m-D 
, m-i , (m-1)C2 
a — aq) eTa 


Since under Ho(o7), os ~ Xp we may choose to split the probability a 
equally between the two tails, in which case C1 = X71. 9/2 MAC) = X$ 1, a/d 


and H(o?) is accepted whenever 


2 
2 (n—1)S 2 
Xn—1; 1-a/2 £ O < Xn-1;0/2* 
Of course, 
m- DS? 


2 2 
Palia = E mt; en] =1-a. 


Then, with s? denoting the observed value of S°, 


o2 


(n — Ds? 


2 n, 2 2 
Alof) = (a eN”; Xn-1; 1-0/2 Š < Xn-1; an} 


and therefore (37) becomes here: 


T(x) = (0? e (0, 00); x e A(o*)} 


(n— 1)s? 
= dh € (0, co); E l-a/2 £ — 7 = ome a/2 
—-1 
ees. a/2 Ey 1-a/2 


that is, we have arrived once again at the familiar confidence interval for 


o?, [ans , ooh (see Example 3 in Chapter 10). 
1;æ/2 n 









Chapter 13 
y 


Y 


A Simple Linear 
Regression Model 


This is a rather extensive chapter on an important subject matter with an 
abundance of diverse applications. The basic idea involved may be described 
as follows. There is a stimulus, denoted by x, and a response to it, denoted 
by y. At different levels of x, one observes the respective responses. How 
are the resulting (x, y) pairs related, if they are related at all? There are all 
kind of possibilities, and the one discussed in this chapter is the simplest such 
possibility, namely, the pairs are linearly related. 

In reality, what one, actually, observes at x, due to errors, is a value of a 
r.v. Y, and then the question arises as to how we would draw a straight line, 
which would lie “close” to most of the (x, y) pairs. This leads to the Principle 
of Least Squares. On the basis of this principle, one is able to draw the so- 
called fitted linear regression line by computing the Least Squares Estimates of 
parameters involved. Also, some properties of these estimates are established. 
These things are done in the first two sections of the chapter. 

Up to this point, the errors are not required to have any specific distribution, 
other than having zero mean and finite variance. However, in order to proceed 
with statistical inference about the parameters involved, such as constructing 
confidence intervals and testing hypotheses, one has to stipulate a distribution 
for the errors; this distribution, reasonably enough, is assumed to be Normal. 
As a consequence of it, one is in a position to specify the distribution of all 
estimates involved and proceed with the inference problems referred to above. 
These issues are discussed in Sections 13.3 and 13.4. 

In the following section, Section 13.5, the problem of predicting the ex- 
pected value of the observation Yo at a given point xy and the problem of 
predicting a single value of Yo are discussed. Suitable predictors are provided, 
and also confidence intervals for them are constructed. 

The chapter is concluded with Section 3.7 indicating extensions of the 
model discussed in this chapter to more general situations covering a much 
wider class of applications. 
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i 13.1 Setting up the Model — The Principle of Least Squares 


As has already been mentioned, Examples 22 and 23 in Chapter 1 provide 
motivation for the statistical model to be adopted and studied in this chapter. 
Example 22, in particular, will serve throughout the chapter to illustrate the 
underlying general results. For convenience, the data related to this example 
are reproduced here in Table 13.1. 








Table 13.1 
DATA OF UNDERGRADUATE GPA (x) AND GMAT SCORE (y) 
The Data x = = y x y x y 
Undergraduate GPA and | 5 6g 447 2.36 399 2.80 444 
y = Score in the 3.59 588 2.36 482 3.13 416 
Graduate Management 3.30 563 2.66 420 3.01 471 
Aptitude Test (GMAT); 3.40 553 2.68 414 2.79 490 
There Are 34 (x, y) 3.50 572 2.48 533 2.89 431 
Pairs Altogether 3.78 591 2.46 509 2.91 446 
3.44 692 2.63 504 2.75 546 
3.48 528 2.44 336 2.73 467 
3.47 552 2.13 408 3.12 463 
3.35 520 2.41 469 3.08 440 
3.39 543 2.55 538 3.03 419 
3.00 509 











The first question which arises is whether the pairs (x, y) are related at all 
and, if they are, how. An indication that those pairs are, indeed, related is borne 
out by the scatter plot depicted in Figure 13.1. Indeed, taking into consideration 
that we are operating in arandom environment, one sees a conspicuous, albeit 
somewhat loose, linear relationship between the pairs (x, y). 


Figure 13.1 





Scatter Diagram for 
Table 13.1 
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So, we are not too far off the target by assuming that there is a straight 
line in the xy-plane which is “close” to most of the pairs (x, y). The question 
now is how to quantify the term “close.” The first step toward this end is the 
adoption of the model described in relation (11) of Chapter 8. Namely, we 


assume that, for each i = 1, ..., 34, the respective y; is the observed value of a 
r.v. Y; associated with x;, and if it were not for the random errors involved, the 
pairs (x;, yi), i = 1,..., 34 would lie on a straight line y = bı + fox; i.e., we 


would have y; = $1 + 62%;,7 = 1,..., 34. Thus, the rv. Y; itself, whose y; are 
simply observed values, would be equal to 6; + 2x; except for fluctuations 
due to a random error e;. In other words, Y; = f¡ + 62x; + ei. Next, arguing as 
in Section 8.4, it is reasonable to assume that the e,'s are independent r.v.'s with 
Ee; = 0 and Var(e;) = o? for all ïs, so that one arrives at the model described 
in relation (11) of Chapter 8; namely, Yj, ..., Y34 are independent r.v.’s having 
the structure: 


Y; = Bi + Box; +e;, with Ee;=0 and Var(e;))=o7, i=1,...,34. 
(a) 


Set EY; = ni. Then, because of the errors involved, it is, actually, the pairs 
(xi, ni), i= 1, ..., 34 which lie on a straight line y= b1 + B225; i.e., nj = Bi +B2%;, 
i = 1, ..., 34. It is in the determination of a particular straight line where the 
Principle of Least Squares enters the picture. According to this principle, one 
argues as follows: On the basis of the model described in (1), what we would 
expect to have observed at x; would be n;, whereas what is, actually, ob- 
served is y;. Thus, there is a deviation measured by y; — n; i = 1, ..., 34 (see 
Figure 13.1). Some of these deviations are positive, some are negative, and, 
perhaps, some are zero. In order to deal with nonnegative numbers, look at 
14; —1n;|, which is, actually, the distance between the points (x;, y;) and (xi, ni). 
Then, draw the line y = ßı + f2x, so that these distances are simultaneously 
minimized. More formally, first look at the squares of these distances (y; — n?, 
as it is much easier to work with squares as opposed to absolute values, and 
in order to account for the simultaneous minimization mentioned earlier, con- 
sider the sum (yi — ni)? and seek its minimization. At this point, replace 
the observed value y; by the r.v. Y; itself and set 


34 


34 34 
SY, B) = X O: - m? = 1% — (Bi + pono? 2» a) (2) 
i=l i=l 


i=l 


where Y = (Y, ..., Ys4) and 8 = (Ai, Ba). 

Then the Principle of Least Squares calls for the determination of 6; and 
Bz which minimize the sum of squares of errors; i.e., the quantity S(Y, 3) in 
(2). The actual minimization is a calculus problem. If there is a unique straight 
line so determined, then, clearly, this would be the line which lies “close” to 
most pairs (x;, Y;),7 = 1, ..., 34, in the Least Squares sense. It will be seen 
below that this is, indeed, the case. 

In amore general setting, consider the model below: 


Y; = bı + Box; + ei, where the random errors 
ei i = 1, ..., nare i.i.d. r.v's. with 
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Ee; = 0, and Var(e;) = 0?, which imply that the r.v.'s 

Y;,1=1,..., nare independent, but not identically distributed, with 

EY, = ni = Bi + Box; and Var(y;) = 07. (8) 
Let $, and $2 be the unique values of £; and £2, respectively, which ee 

the sum of squares of errors S(Y, 6) = X; [Y — (Br + Ban); e? 

These values, which are functions of the Y;’s as well as the x;'s, are the Teac 

Squares Estimates (LSE's) of 6; and fz. Any line y = ßı + Bax is referred 

to as a regression line and, in particular, the line y = 8, + Box is known as 

x ba regression line. id this line, the ĝ;’s corresponding to the x;'s are 

= ĝi + Box, r= Lo. 





i 13.2 The Least Squares Estimates of 3; and 82 and Some of their Properties 


THEOREM 1 


In this section, the LSE’s of 6; and $z are derived and some of their properties 
are obtained. Also, the (unknown) variance o? is estimated. 





In reference to the model described in (3), the LSE’s A; and fs of 6, and 
Ba, respectively, are given by the following expressions (which are also 
appropriate for computational purposes): 


(ia) aa Yi) — (Xi 00) (Xi eK) 
My ja UF — = ae 





m= , (4) 


and 
TE O A O e l 


Ba (5) 
A os ek 











PROOF Consider the partial derivatives: 


I) + sY, p) = 20 Bi — Bar ED) = Eroana a) 
i=1 i=1 i=1 
(6) 





ZS, B) = 2 Dr — Pi — Barx) 
a 


-2( Soin - aon me Pat) (7) 


and solve the so-called normal equations: TA S(Y, 6) = 0 and A S(Y, B) = 0, 


or npi + (jy i) Ba = X; Yi and O 2481 + Oia LB = Dia vY; to 
find: 








Lar Dar Yi 
3 EE ar Yi, _ (Xia wd) (Xia Yi) — (Xia 00) (i 2%) 
Br E n Xii ti ~ 2 i 


nY iati — (Pi 7) 





| Di ti Lis? 
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and 


| ) Din Y 
Ê Diti Dha a, 





= Mi (ia in 1 Yi) 
Myint e — (Xi xi) nY iati (0 Hm) 


It remains to show that $; and 8», actually, minimize S(Y, 8). From (6) and 
(7), we get: 








2 
+ S(¥, B) = 2n, SY, 8) = SY, 8) = 2D as 


əbi me dB2 de Opi 
ap B)= Pat 


and the 2 x 2 matrix below is positive semidefinite for all £1, 62, since, for all 
21, 42 reals not both 0, 


n a ee 
lata EDO 
Di Yi Di Y he 
n n n Ài 
=lin+2 xi A Leta x? 


n n 
=2N4 21% aj tag >a? 
i=1 i=l 


: = 1L£ 
= Ant Wy TRAY (wen =- ) nı) 
i=1 


i=1 


n 
= Aint 21428 + a2 (2 x? — us) + Mna? 
i=1 


n 
= (Aj + 221428 +35) +43 Y (ari — ay 
i=l 


n 
= nA +AT +43) (ai 2 = 0. 
i=l 
This completes the proof of the theorem. A 
COROLLARY With = (4 +---+%,)/nand Y = (Yı + --- + Y,)/n, 


the LSE’s $, and $2 may also be written as follows (useful expressions for 
noncomputational purposes): 





=Y- a, — Diz i — IA — PE 1 
ad di Pi&i — xy Mea Ce _ xy + de DY. 


D> 
D: 
bo 
Sa 


(8) 
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PROOF First, 
n g n on i i 
Nr -D-P=)N Y-Y 2-2) Y+ nY 
i=l i=l i=l i=l 
n 1 n n _ 7 
= Y Y; = = (Es) (È :) — naY + nY 
i=1 i=1 i=1 
1 n n n 
el i=l i=l 
and 
n n 2 n n 
ny) xj — ( ni) =n) x} — (nt? = (Es - na?) 
= i=l i=l i=l 
n 
= nY (ai — 7) 
i=l 
Therefore 
Ni ADM Y) nh (ai -— DY- Y) 
La =1 + nY jo (0 — DY 


ja Yi (ica Bi) (Wier Yi) 
= 2 
niati — (Xi ti) 
The second expression for ĝə as a linear combination of the Y;’s follows, 


because 


Ta-aa -D= e-a,- E-D S E- DY, 
i i=l i=1 i=1 


i=1 


= fo, on account of (5). 





since 


n n n n 
S @i-a)=) m-nm=} mi - > r=. 
i=l i=l i=l i=l 


P- far P Pia ii (Li t) (Xiz Y;) 
nD jn Uj — (Xi xi) 
nY i Y — WEY _ 
nY; L? — MERA 
nY Y jay Up — WEY -ni Y i1 IN + nZ Y 
nDia — ay 
Mat (Dat) E anf) a 


= ny” (x; — D = iby (4). A 
i=1 





x 


| 
Ei 
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The following notation is suggested (at least in part) by the expressions in 
the LSE’s $, and fo, and it will be used extensively and conveniently throughout 
the rest of this chapter. 

Set 


2 
n n n 1 n 
SS; = Geo = oe — ng? =Y e = > ($a) f 
i i=l i=l $1 


i=1 


and likewise 


4=1 


2 
n 2 n 1 n 
SS, = (Y; — P? = y ¥P— nf? = bv (Ex). 
i=l i=1 i=1 
(9) 


and 


SSe = Le - 0 -P)= E _ py, 


AS) 


Then the LSE's Bi and Bo may be rewritten as follows: 
x Lo 1L A SSry 
ai y) Y ) : = : 10 
pi = n E 6 n) Bo SS, ( ) 


i=l 
Also, recall that the fitted regression line is given by: 


II 
il uM 





> 


ĝ= ĝi + 2x andthat ĝi = ĝi + fom, i=1,..., n. (11) 
Before we go any further, let us discuss the example below. 


In reference to Table 13.1, compute the LSE’s Bi and Bo and draw the fitted 
regression line y = $, + Bow. 


DISCUSSION The application of formula (10) calls for the calculation of 
SS, and SS, given in (9). Table 13.2 facilitates the calculations. 


Y xi=100.73, Y y;=16,703, Y x;=304.7885, ) x,y;=50,066.47, 


and then 


(100.73) 


SS; = 304.7885 — ~ 304.7885 — 298.4274 ~ 6.361, 


(100.73) x (16,703) 


SS, = 50,066.47 
d 34 





= 50,066.47 — 49,485.094 
= 581.376. 
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x y e xy x y e xy 
3.63 447 13.1769 1,622.61 3.48 528 12.1104 1,837.44 
3.59 588 12.8881 2,110.92 3.47 552 12.0409 1,915.44 
3.30 563 10.8900 1,857.90 3.35 520 11,2225 1,742.00 
3.40 553 11.5600 1,880.20 3.39 543 11.4921 1,840.77 
3.50 572 12.2500 2,002.00 2.36 399 5.5696 941.64 
3.78 591 14.2884 2.233.98 2.36 482 5.5696 1,137.52 
3.44 692 11.8336 2,380.48 2.66 420 7.0756 1,117.20 
2.68 414 7.1824 1,109.52 3.01 471 9.0601 1,417.71 
2.48 533 6.1504 1,321.84 2.79 490 7.7841 1,367.10 
2.46 509 6.0516 1,252.14 2.89 431 8.3521 1,245.59 
2.63 504 6.9169 1,325.52 2.91 446 8.4681 1,297.86 
2.44 336 5.9536 819.84 2.75 546 7.5625 1,501.50 
2.18 408 4.5369 869.04 2.73 467 7.4529 1,274.91 
2.41 469 5.8081 1,130.29 3.12 463 9.7344 1,444.56 
2.55 538 6.5025 1,871.90 3.08 440 9.4864 1,355.20 
2.80 444 7.8400 1,243.20 3.03 419 9.1809 1,269.57 
3.18 416 9.7969 1,302.08 3.00 509 9.0000 1,527.00 

Totals 50.35 8,577 153.6263 25,833.46 50.38 8,126 151.1622 24,233.01 
Then 
i 581.376 16,703 100.73 


x 91.397 and ĝi = 





— (91.397) x —— 


e= 34 


6.361 
= 491.265 — 270.809 = 220.456, 


and the fitted regression line y = 220.456 + 91.397x is depicted in the 
Figure 13.2. 


Figure 13.2 


The Fitted 
Regression Line Y = 
220.456 + 91.397x 
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The LSE’s $; and ; have the desirable property of being unbiased, as shown 
in the following theorem. 





The LSE’s $, and ĝ are unbiased; i.e., EB, = 6, and Ef, = b2. Further- 
more, 


A 1 TA A a? 
Var(Bi) = o 16 + =) and Var( fo) = 55,” 


where SS, is given in (9). 








PROOF In this proof and also elsewhere, the range of the summation is not 
explicitly indicated, since it is always from 1 to n. Consider fz as given in (8). 
Then: SS¿$o = >>, (%; — X)Y;, so that, by taking expectations: 


SS, Epo = Y (wi — DEY: = Yi — DE + poxi) 


= py 2 a: — 21) + Ba Y vito; — %) = Bo Y vito; — #) 


= Pa (£ a wa) = po) “(ai — DP = SSxpr. 
Therefore, dividing through by SSy, we get Ef = fo. Next, also from (8), 
A 1 - 
Ep, = EY — pot) = EY —2EP2 = — Y (Br + Bari) — 3p 
= Pi + Bot — Box = Bi. 

Regarding the variances, we have from (8): SS, Bo = (72; — DY; so that: 

SS? Var(B2) = var( Eco — or) = Y as — ay var(Y,) 

=0? a — # = 0° SS, 

so that Var(ĝ2) = o7/SS,. Finally, from (8), 


m z — — x) 
Pi =F =D $ =m Y= þa E | 


(12) 





so that 


> 1 a-a] 1.2 1 2? 
varón) = 0? y [E = 07 + 35% =o" a SS . A 


2 
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[__ EXAMPLE2 | DISCUSSION  Inreferenceto Example 1, the variances of the LSE's $; and 
Be are given by: 


1 8.777 
Vi = AA cS 2 is $ = a 2 
ar(By) = o (a + =a 0”(0.029 + 1.380) = 1.4090*, 


and 


A o? 2 
Var(B2) = 6361 ~ 0.1570". 
In fitting a regression line, there are various deviations which occur. At this 
point, these deviations will be suitably attributed to several sources, certain 
pieces of terminology will be introduced, and also some formal relations will 
be established. To this end, look at the observable Y; and split it as follows: 
Yi = Yi +(Y,— ĝi). The component Ñi represents the point (xi, Vi) which lies on 
the fitted regression line y = $, + fx, and the difference Y; — ĝ; is the deviation 
of Y, from ĝi. We may refer to the component 9%; as that part of Y; which is 
due to the linear regression, or it is explained by the linear regression, and 
the component Y; — y; of Y; as the residual, or the deviation from the linear 
regression, or variability unexplained by the linear regression. We can go 
through the same arguments with reference to the sample mean Y of the Y;’s. 
That is, we consider: 


¥,-Y=(@:-Y)+(%- 9). 
The interpretation of this decomposition is the same as the one given above, 
but with reference to Y. Next, look at the squares of these quantities: 
(Y, = Yy; (Gi =, uy, (Y, a oi), 
and, finally, at their sums: 
n E n _ n 
Dr-F, Eat, YG - Hw. 
i=l i=l i=l 
At this point, assume for a moment that: 
n _ 
Y -YY= ye -YY+ Sa D. (13) 
i=l i=l 


Then this relation would state that the total variability (of the Y;’s in reference 
to their mean Y), $}; (Y; — Y Y), is the sum of the variability 7 (9; — Y Y 
due to the linear regression, or explained by the linear regression, and the 
residual variability, Y; (Y; — Gi)’, or variability unexplained by the linear 
regression. 


We proceed in proving relation (13). 
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Let SS7(=SS,, see (9)), SSp, and SSz, respectively, be the total variabil- 
ity, the variability due to the linear regression (or explained by the linear 
regression), and the residual variability (or variability not explained by 
the linear regression); i.e., 


SSr(=SSy)=)\(%-YY, S=) Gi PY, S=) (MM), 
nl i=l (=l 
(14) 


where Y; = Êi + ĝ2xi, i = 1, . .. , n, the LSES $, and fy are given by (10) 
(or (8)), and Y is the mean of the Y;’s. Then: 


© SST = SSp + SSp. (15) 
Furthermore, 
y SS? SS?, 
Gi) SSr = SS,, SSR = SS,” andhence SSg = SS, — 5S,’ (16) 


where SS,, SS,, and SS,y are given in (9). 








PROOF We have: 
SSr => (%- PY =) (9; - Y) + (1; - 90" 
i i 
= (9% -PY+ Y T -0D +29 Oi- DA 9) 
i i i 
= SSr + SSe +2) (9; — YM — Da). 
i 
So, we have to show that the last term on the right-hand side above is equal 
to 0. To this end, observe that %; = $1 + fox; and $1 = Y — f22 (by (8)), so that 
di — F =P + ĝoxi — Y = Y — fot + Box; — Y = Bala — 0, 
and 
Y; — ĝi = Y; — Bi — boa = Y; — Y + ot — Bon; = Yi — Y) — (ai — T), 
so that 





Qi- DY: — 9) = Palos — DIY: — Y) — Bala; — D] 
= Palo; — DY; — F) — Balas — ay. 





Therefore, by (9) and (10): 











E SS; SS2 ss2 ss? 
Di — Y NE — Di) = — x SS, — — x SS, = 2 -2=0. (17 
2 0i- PIG 0 = Bet A 580 — Bex X SSx = SS, — “Se (17) 


Thus, SSp = SSp + SSp. 


374 Chapter 18 A Simple Linear Regression Model 


Gi) That SS; = SS, is immediate from relations (9) and (14). Next, 
di — Y = Bi + fox; -Y =Y — Peas fox — Y = Êo(xi— T) (by (8) and (1D), 
so that, by (10), 


», Yr =P a _ 857 Ss? 
SS = Y (9; - YY = ÊY (0-1 = ESE x SS, = g 


as was to be seen. Finally, 


SSe = 6-9 = IH - P- i- HY 


=) 1G -FY+ Gi - PY -2) (0:- Na -Y) 
= SSp + SSp- 29 (9 - YM Y), 


and 


0 D: - F) = 1G - HIG - b) + Ai- PN 
= Y 0 - PD) +) 0i- Y Y = SS (by (17). 


2 
ss?, 


It follows that SSz = SST — SSp = SS, — yg aS Was to be seen. A 


This section is closed with some remarks. 


REMARK 1 


(i) The quantities SSr, SSp, and SSpg, given in (14), are computed by way of 
SS,, SSy, and SS, given in (9). This is so because of (16). 

(ii) In the next section, an estimate of the (unknown) variance o? will also be 
given, based on the residual variability SSg. That this should be the case 
is intuitively clear by the nature of SSz, and it will be formally justified in 
the following section. 

(iii) From the relation SSr = SSp + SSg given in (15) and the definition of the 
variability due to regression, SSp, given in (14), it follows that the better 
the regression fit is, the smaller the value of SSp is. Then, its ratio to the 
total variability, SS7, r = SSp/SSr, can be used as an index of how good 
the linear regression fit is. 


fl 13.3 Normally Distributed Errors: MLE's of 3, 32, and oÊ, Some Distributional Results 





It is to be noticed that in the linear regression model as defined in relation (8), 
no distribution assumption about the errors e;, and therefore the r.v.’s Y;, was 
made. Such an assumption was not necessary, neither for the construction of 
the LSE’s of £1, 62, nor in proving their unbiasedness and in calculating their 
variances. However, in order to be able to construct confidence intervals for 
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ßı and B and test hypotheses about them, among other things, we have to 
assume a distribution for the e;’s. The e;'s being errors, it is not unreasonable 
to assume that they are Normally distributed, and we shall do so. Then the 
model (3) is supplemented as follows: 


Y, = Bi + Bow; +e;, where e;,i=1,..., n are independent r.v.'s 
~ N(0, 02), which implies that Y;, i = 1,..., n are (18) 
independent r.v.’s and Y; ~ N(f; + 622, o”). 


We now proceed with the following theorem. 





Under model (18): 


(i) The LSE’s ĝ; and É» of B; and fs, respectively, are also MLE’s. 
(ii) The MLE 6? of a? is given by: 6? = SSg/n. 
(iii) The estimates 6; and f> are Normally distributed as follows: 


pe 1 m2 sl 2 
h~ n(m (+ 5) pa =) 


where SS, is given in (9). 








PROOF 


(i) The likelihood function of the Y;'s is given by: 


: 1 . 1 
L(A, ---, Yni Bi, Be, 07) = (+=) exp | 207 Y Bi pani? 


For each fixed oĉ, maximization of the likelihood function with respect 
to $, and fz, is, clearly, equivalent to minimization of >,(y;— Bi — Box)” 
with respect to $, and #2, which minimization has produced the LSE's $; 
and Ba. 

(ii) The MLE of o? is to be found by minimizing, with respect to oĉ, the 
expression: 








A A n n ; 1 
log L(yi, -.., Yn; Bi, Bo, 0”) = —5 log(27) — 5 logo? > SSz, 
2 2 20 


since, by (14) and (11), ¡(Yi — i — Par? = (yi — 9” = SSp. From 
this expression, we get: 





d ai be 2 n 1 SSg 
goz OB LOW -> Yni B1, Ba, O )= 9 x o2 + 202? =0, 


so that o? = SSz/n. Since 
@ L( ; Ê B 2 | = 
d(o2) og L(Y ---, Yn; Br, Pa, 07) 0?=SSp/n — 


it follows that 6? = SS /nis, indeed, the MLE of o?. 


nÈ 


~~, <0, 
255? 
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(iii) From (12), we have: ĝi = Xl} — Y, and we have also seen in 
Theorem 2 that: 


Ep, =f, Var(ĝı) = 6 + =). 
, n SS 
Thus, $, is Normally distributed as a linear combination of independent 
Normally distributed r.v.’s, and its mean and variance must be as stated 
above. Next, from (8), we have that: $2 = ÉS )Y;, so that, as above, 
B2 is Normally distributed. Its mean and variance have been computed in 
Theorem 2 and they are $2 and o7/SS,, respectively. A 


Before proceeding further, we return to Example 1 and compute an esti- 
mate for o”. Also, discuss Example 23 in Chapter 1 and, perhaps, an additional 
example to be introduced here. 


In reference to Example 1, determine the MLE of o?. 
DISCUSSION By Theorem 4(ii), this estimate is: 6? = Se. For the com- 


putation of SSz by (16), we have to have the quantity »” y? from Table 13.2, 
which is calculated to be: 


X yj = 8,373,295. (19) 
Then, by (9), 
SS, = 8,373,295 — es = 8,373,295 — 8,205,594.382 = 167,700.618, 
and therefore 
SSg = 167,700.618 — cy ~ 167,700.618 — 53,135.993 = 114,564.625; 


Le., 

ng  114,564.625 

O = — 
34 


Since SSr = SS, = 167,700.618 and SSp = 53,135.993, it follows that only 


5313508 = 31.685% of the variability is explained by linear regression and 


14564:6235 ~ 68.315% is not explained by linear regression. The obvious outlier 


(3.44, 692) may be mainly responsible for it. 


SSg = 114,564.625 and then = 3,369,548. 





In reference to Example 23 in Chapter 1, assume a linear relationship between 
the dose of a compost fertilizer x and the yield of a crop y. On the basis of the 
following summary data recorded: 


n=15, 2=108, y=122.7, SS,=70.6, SS,=98.5, SS, = 68.3: 
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(i) Determine the estimates $; and fs, and draw the fitted regression line. 
(ii) Give the MLE 6? of o?. 
(iii) Over the range of x values covered in the study, what would your con- 
jecture be regarding the average increase in yield per unit increase in the 
compost dose? 


DISCUSSION 
(1) By (10), 
ĝ = 82 ~ 0.967 and ĝi = 122.7 — 0.967 x 10.8 ~ 112.256, 
and g= 112.256 + 0.9672. 


(ii) We have: 6? = 5%, where SSp = 98.5 — 83" ~ 32.425, so that 6? = 
32.425 __ 2.162 ` 
B Pe 

(iii) The conjecture would be a number close to the slope of the fitted regres- 
sion line, which is 0.967 (Figure 13.3). 


Figure 13.3 


The Fitted 
Regression Line ĝ = 
112.256 + 0.967x 




















In one stage of the development of a new medication for an allergy, an exper- 
iment is conducted to study how different dosages of the medication affect 
the duration of relief from the allergic symptoms. Ten patients are included in 
the experiment. Each patient receives a specific dosage of the medication and 
is asked to report back as soon as the protection of the medication seems to 
wear off. The observations are recorded in Table 13.3, which shows the dosage 
(x) and respective duration of relief (y) for the 10 patients. 


(i) Draw the scatter diagram of the data in Table 13.3 (which indicate ten- 
dency toward linear dependence). 
(ii) Compute the estimates $; and fs, and draw the fitted regression line. 
(iii) What percentage of the total variability is explained by the linear regres- 
sion and what percentage remains unexplained? 
(iv) Compute the MLE 6? of o?. 


378 


Table 13.3 


Dosage (x) (in 
milligrams) and the 
Number of Days of Relief 
(y) from Allergy for 10 
Patients 
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x y x? y xy 
3 9 9 81 27 
3 5 9 25 15 
4 12 16 144 48 
5 9 25 81 45 
6 14 36 196 84 
6 16 36 256 96 
7 22 49 484 154 
8 18 64 324 144 
8 24 64 576 192 
9 22 81 484 198 
Totals 59 151 389 2,651 1,003 

















DISCUSSION 
(i), (ii) First, SS, = 389 — 3% = = 40.9 and SS, = 112.1, and 
hence: 
R 112.1 x 151 59 
9 = —— 22.741 d = — — 2.741 x — ~ —1.072. 
f = T09 and PUG * 10 id 


Then the fitted regression line is 7 = —1.072 + 2.741x (Figure 13.4). 


(iii) Since SSp = SS, =2,651 — LÉ=370.9 and SSp = X D? ~ 307.247, it 
40.9 

follows that SSg = 370.9 — 307. 247 = 63.653. Therefore “352% ~ 82.838% of the 

variability is explained by the linear regression and Er Es 17.162% remains 

unexplained. 


(iv) We have: 6? = 82 — 6.3653 ~ 6.365. 





Figure 13.4 





Scatter Diagram and 
the Fitted 
Regression Line ĝ = 
-1.072 + 2.741x 


y = -1.072 + 2.741x 

















For the purpose of constructing confidence intervals for the parameters 
of the model, and also testing hypotheses about them, we have to know the 


13.3 Normally Distributed Errors 379 


distribution of SSz and also establish independence of the statistics $, and 
SSz, as well as independence of the statistics $2 and SSg. The relevant results 
are stated in the following theorem, whose proof is deferred to Section 13.6. 





THEOREM 5 
Under model (18): 


(i) The distribution of SSg/0? is x? y. 
(ii) The following statistics are independent: 


(a) SS and fo; (b) Y and fs; (c) SSg, Y and fo; (d) SSy and fy. 








PROOF Deferred to Section 13.6. 
To this theorem, in conjunction with Theorem 4, there is the following 
corollary. 


COROLLARY Under model (18): 
(i) The MLE 6? of o? is a biased estimate of o”, but 256? = 2, call it S?, 
is an unbiased estimate of o?. 











a ĝi- an, Ba— Pa 
ii) ——— ~ hna, (iii ~ tn-2; 20 
Sea 1, @ tn-2 TI n—2 (20) 
n SS 
where 
S? = SS /(n-— 2). (21) 
PROOF 
(i) It has been seen in Theorem 4(ii) that ôt = e — = — x 35. Since 


SE ~ x2_o, it follows that E(S£) = n—2, or E(°5) = 07”, so that 22 
is an unbiased estimate of o”. Also, Eô? = "- =2 p. Só) = = = a”, SO that 
6? is biased. 

Gi) By Theorem 4(iii), 


Bi— Bi _ a 


























ae ~ NỌ, 1), 
S. (ÊD i432 S5 
and Y. = oe = ood. ~ x?_,. Furthermore, Am TaD and °F are 
independent, since $; and SS are so. Tt follows that: 

A E A 1 72 

@ı - 61)/s.4.(61) (i — BD/0 4 n + 35, 
~tr, Or = ~ tn-2, 

VE /m-2) V8?/o 

or, finally, 
Bi — Bi a 








sj: + oiy 
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(iii) Again by Theorem 4(iii), 











AR B Ê ~ NO, 0), 
s.d.(po) o//SSSy 
and foe and 2 are independent, since $ and SS, are so. Then: 


(Pa — B2)/s.d.(Ao) _ (Ba — B2)/ 755 











ta-2, Or ~ tn-2, 
St /(n— 2) vVS?/o° 
or, finally, gate ~tr. A 


3.1 Verify the result ¿2 log(m, ..., Yn; Bi, Bo, lisos. = — yg as claimed in 


the proof of Theorem 4(ii), where t = 0?. 


3.2 Consider Table 13.1 and leave out the “outlier” pairs (8.63, 447), (8.44, 
692), and (2.44, 336). Then recalculate all quantities below: 


Say Sow Jog, Doze So uP, SSe SSy SS 
i i i i i 
3.3 Use the calculations in Exercise 3.2 to compute the estimates ĝ1, $2 and 
the fitted regression line. 


3.4 Refer to Exercise 3.2, and compute the variances Var(ĝ1), Var(B2), and 
the MLE of o?. 


3.5 By Theorem 5, the r.v. SSz/o? is distributed as Xo Therefore, in the 
usual manner, 





, 


| SSE SSr | 


2 2 
Xn-2;% Xn—2:1-4 


is a confidence interval for o? with confidence coefficient 1 — a, where 
SSg is given in (16) and (9). That is, SSz = SS, — -a where SSy, SSy, 
and SSyy are given in (9). 

(i) Refer to Example 1 (see also Example 3), and construct a 95% con- 

fidence interval for o?. 

(ii) Refer to Example 4 and do the same as in part (i). 
(iii) Refer to Example 5 and do the same as in part (i). 
(iv) Refer to Exercise 3.2 and do the same as in part (i). 


3.6 Consider the linear regression model given in relation (18), and let 2% 
be an unknown point at which observations Yo;, i = 1, ..., mare taken. 
It is assumed that the Yo,’s and the Y;'s are independent, and set Yo = 
tii You. Set y = (Y, ---> Yn), Yo = (Yor, ---, Yom) for the observed 
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values of the Y;’s and Yo;’s, and form their joint log-likelihood function: 


A = A(Bi, Ba, 0°, xo) = log L(Br, B2, 0”, Col Y, Yo) 


E MM loger) - m+n 








logo? 
IpL m 
-Lw = bı = Boa) + X Yo = Bi = pazo, 

j=l i=1 

(i) Show that the log-likelihood equations T =0, 7 = 0, and 5 =0 

produce the equations: 
(m+ pı + (may +NT)P2 = myo + NY (a) 
(mx + nI) + (ma F > x3) = MXYo + a UY; (b) 
j j 


Bi + XoB2 = Yo. (c) 


(ii) In (c), solve for £1, $1 = Yo — xo f2, replace it in (a) and (b), and solve 
for f2 to obtain, by assuming here and in the sequel that all divisions 
and cancellations are legitimate, 


by = 7% ka 2 ¡YY —nNTYo 
? ° 25 — nak © 





(d) 


=a 


Gii) Equate the f2’s in (ii), and solve for xy to obtain: 
Lo = E y (xj — xy + BY) ayy = | 
J J j 


[Gem 


e [ru 3 (aj — DY + g2 XjYj — "OD, «| 
A E | © 


(iv) Replace xo in the first expression for f2 in (d) in order to get, after 
some simplifications: 


ny tii 0,4) 0,95) 

= 2 
ny HF = (2,23) 

and observe that this expression is the MLE (LSE) of f2 calculated 


on the basis of y; and xj, j = 1,..., only (see relation (5) and 
Theorem 4(i)). 





Ba ) ®© 
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(v) Replace xq and $z in the expression fı = Yo — %o 2 in order to arrive 
at the expression 


(27); w) — (La) (ey) 
nja? = (E 


after some calculations, and observe that this is the MLE (LSE) of 6; 


, (8) 





bi = 


calculated on the basis of y; and xj, j = 1, ..., only (see relation 
(4) and Theorem 4(i)). 
(vi) It follows that the MLE’s of 61, Ba, and xo, to be denoted by £1, Bo, and 


Xo, are given by the expressions: 
E (DO) yj) — (2,0) (0,145) 
Y, - (2,0) 
_ NY); LY; = (DN; Yj) 
ny), a; — (£j) ' 





? 


w 





2 


and 


A Yo — Bi 
Zo = 1; 
Ba 


(vii) Differentiate the log-likelihood function with respect to o?, equate 
the derivative to zero, and replace f;, 62, and xo by their MLE’s in 
order to obtain the MLE 6? of o?, which is given by the expression: 


1 
6” = ——-(SSy + SSop), 
m+n 


where 
N 2 N N a 2 
SSe =} (w -3 = >) (Cy — Bi — Boa) , 
j=1 


j=1 Jj 
and 


m n 2.2 m 
SSog = Y (yoi — Bi — Pato) = Y (Yo — yo. 
i=l i=1 
Also, by means of (14) and (16), 


SS2 n 1 n 2 
SSe = SSy— =, weess Dj 2( 3-5) 


j=l 


n 


2 
1 n n 1 n n 

SSy = > Y = Al ) w) y SSyy = > YY; — z [E w) 
j=1 j=1 j=1 


j=l y= 
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and 


m m 2 
SSop = Xo You — E ws) : 
izi i=l 


(viii) Observe that, by Theorem 5(i), L£ ~ x2, whereas 2 ~ x2. 
Then, by independence of the Y;'s and the Yo;'s, it follows that 
H(SSg + SSon) ~ Xans 

(ix) Observe that, by Theorem 8(i), Yo = Br + Pao ~ N(Bi + Pato, oh + 

Cm 2% )), whereas Yo ~ N(B1 + B2Xo, - and ĝo and Yo are indepen: 

dent, so that the r.v. V = Y, — ĝo ~ N(0, 0%), where of = o? (4 + 44 

(4-0, and © ~ N(O, D. 

Observe that, by Theorem 5, the r.v.’s V/oy and (SSg + SSog)/o? are 

independent, so that 





(x 


mn 








V/oy Voy} +14 See 
1 /SSe+S%e 1 /SSg+SSog 
O m+tn—3 o m+n-3 


Jm+n—3V 


= ~ tmin — 3. 


y È pig A |(8Sp + SSog) 


I 13.4 Confidence Intervals and Hypotheses Testing Problems 


The results obtained in the corollary to Theorem 5 allow the construction 
of confidence intervals for the parameters of the model, as well as testing 
hypotheses about them. 











THEOREM 6 
Under model (18), 100(1 — œ )% confidence intervals for 6; and fz are 
given, respectively, by: 


A il E Li. e 
— tn-2:0/25,] = + —, SPE eure le 22 
A tn—2;a/2 aD SS, By + tn-2;a/2 Ga a (22) 


and 
S S 
—2%0 Oe 23 
|è- tnia E Ba + bzang) (23) 
where S = /SSz/(n — 2), and SSz, SS+, and ĝi, Bo are given by (16), (9), 
and (10). 








PROOF The confidence intervals in (22) and (23) follow immediately from 
results (ii) and (iii), respectively, in the corollary to Theorem 5, and the familiar 
procedure of constructing confidence intervals. A 
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REMARK 2 A confidence interval can also be constructed for o? on the 
basis of the statistic SSg and the fact that SSg/o° is distributed as x?_5. 

Procedures for testing some hypotheses are summarized below in the form 
of atheorem. The tests proposed here have an obvious intuitive interpretation. 
However, their justification rests in that they are likelihood ratio tests. For the 
case of simple hypotheses, this fact can be established directly. For composite 
hypotheses, it follows as a special case of more general results of testing 
hypotheses regarding the entire mean n = fı + fox. See, e.g., Chapter 16 
and, in particular, Examples 2 and 3 in the book A Course in Mathematical 
Statistics, 2nd edition (1997), Academic Press, by G. G. Roussas. 





Under model (18): 


G) For testing the hypothesis Hp: 6; = fo against the alternative H4: 
B1 % Bio at level of significance a, the null hypothesis Ho is rejected 
whenever 


A Gee 
t| > tn-2;0/2, Where t = (fi — BOYS, J= + 55, (24) 


(ii) For testing the hypothesis Ho: 62 = 629 against the alternative Ha: 
Ba + Bay at level of significance a, the null hypothesis Ho is rejected 
whenever 


a S 
|t| > tn-2;0/2) where t = (f2 — pm) | (25) 


When the alternative is of the form Ha: 2 > Bao, the null hy- 
pothesis is rejected whenever ¢ > tn-2a, and it is rejected whenever 
t < —tn-2%0 if the alternative is of the form Ha: Bo < p20. 











REMARK 3 


(i) In the reference cited above, the test statistic used, actually, has the F- 
distribution under the null hypothesis. It should be recalled, however, that 
if t has the t-distribution with r d.f., i.e., t = Z/y Xr /r, where x? has the 

x? ka a with r d.f. Z ~ N(0, 1) mad Z and x? are independent then, 
2 = E has the F-distribution with 1 and r d.f. 
(ii) Pypoteses can also be tested about o? on the basis of the fact that 


as 


SSp a 


In reference to Example 1: 


(i) Construct 95% confidence intervals for 6; and £2. 
(ii) Test the hypothesis that the GMAT scores increase with increasing GPA 
scores. 


DISCUSSION (i) The required confidence intervals are given by (22) and 
(23). In the discussion of Example 1, we have found that: x ~ 2.963, 
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SS, ~ 6.361, $; ~ 220.456, and ĝ ~ 91.397. Also, in the discussion of Example 
3, we saw that SSy ~ 114,564.625, so that, by (21), S = (#4338 )1/2 ~ 59.834. 
Finally, 132:0.025 = 2.0369. Then 


x? 1 (2.963) 
2 E ~_ = 220.456 — 2.0369 Ax 
Bi — tna; 2S + SS, 0.456 — 2.0369 x 59.83 T + al 


~ 220.456 — 121.876 x 1.187 ~ 220.456 — 144.667 = 75.789, 


1 g2 
By + th 20/25. mas a ~ 220.456 + 144.667 = 365.123. 


So the required intervali is Ta 789, 365.123]. 
Likewise, tn-2;4/2 eq ~ 1.6939 x GE = 1.6939 x 23.725 ~ 40.188, and 
therefore the required interval for bə is: [51.209, 131.585]. 








(ii) Here we are to test Ho: $2 = 0 against the alternative Ha: 62 > 0. Let us 
take a = 0.05, so that t32.0.05 = 1.6939. The observed value of the test statistics 
is: 
_ pe — Boo _ 91.397 
~ 8/./SS; 23.725 





= 3.852, 


and the null hypothesis is rejected; the GMAT scores increase along with in- 
creasing GPA scores. 


In reference to Example 4: 


(i) Construct 95% confidence intervals for 6; and fs. 
(ii) Test the hypothesis that crop yield increases with increasing compost 
fertilizer amounts. 


DISCUSSION _ (i) Inthe discussion of Example 4, we have seen that: n = 
15, 7 = 10.8, SS, = 70.6, SS, = 98.5, Bj ~ 112.256, and f> ~ 0.967. It follows 
that: 


98.51 q? 1 (10.8) 
=[ 1 227 124 dé 00% = 
S (=) 53 and S PICA SR T 


~ 2.753 x 1.311 = 3.609. 





Since t13;0.025 = 1.1604, it follows that the required observed confidence interval 
for fi is: [112.256 — 1.1604 x 3.609, 112.256 + 1.1604 x 3.609], or [108.068, 
116.444]. 

Next, aay ~ fe = = 0.328, and ti30.025 757, = 1.1604 x 0.328 ~ 0.381, 
so that the required observed confidence interval for f is: [0.586, 1.348]. 
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(ii) The hypothesis to be tested is Ho: B2 = 0 against the alternative Ha: b2 > 0. 
Take a = 0.05, so that 3.0.05 = 1.7709. The observed value of the test statistic 
is: 
_ po — Bao _ 0.967 
~ S/VSS, 0.328 
and therefore the null hypothesis is rejected. Consequently, crop yield in- 
creases with increasing amounts of compost fertilizer. 





~ 2.948, 


In reference to Example 5: 
(i) Construct 95% confidence intervals for $; and Bs. 
(ii) Test the hypothesis that the duration of relief increases with higher dosages 
of the medication. 


DISCUSSION (i) From the discussion of Example 5, we have: n= 10, 
2=5.9, SS, =40.9, SSg = 63.653, ĝi = —1.072, and $2 ~ 2.741. Then S = 
(6653)1/2 ~ 2.821. Also, ts:0.025 = 3.3060. Therefore: 


{1 q? FA (5.9)? 
ibo -+ —=8. 2.821 — + ——— = 9.326 x 0.975 
l, 2; 2S Fa + SS, 3.306 x 2.821 x 10 + 40.9 X 


= 9.093. 


Hence the required observed confidence interval for £; is: [—1.072 — 9.093, 
—1.072 + 9.093], or [—10.165, 8.021]. Next, 


S 2.821 
2/2 = 3.306 x L— ~ 3.306 x 0.441 ~ 1.458, 
aajt age 40.9 


and therefore the required observed confidence interval for fa is: [2.741 — 
1.458, 2.741 + 1.458], or [1.283, 4.199]. 


(ii) The hypothesis to be tested is Hp: B2 = 0 against Ha: Bo > 0, and let us 
take a = 0.05, so that tg.o.05 = 1.8595. The observed value of the test statistic 
is: 
Ba — Boo 2.141 
~ S/ SS, 0.441 
and the null hypothesis is rejected. Thus, increased dosages of medication 
provide longer duration of relief. 





~ 6.215, 


4.1 Refer to Exercises 3.2 and 3.4, and compute 95% confidence intervals for 
Bi and fa. 


4.2 Refer to Exercises 3.3 and 4.1, and test the hypotheses Ho: 6; = 300 
against Ha: 6, 4 300, and Ho: Bz = 60 against Ha: B2 % 60, each at level 
of significance a = 0.05. 
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4.3 Refer to Example 5 and: 
(i) Derive 95% confidence intervals for 6; and fo. 
(ii) Test the hypothesis Ap: 6; = —1 against the alternative Ha: 6; 4 — 
at level of significance a = 0.05. 
(iii) Do the same for the hypothesis Ho: 62 = 3 against the alternative Ha: 
Bz + 3 at the same level a = 0.05. 


4.4 Suppose the observations Yj, ..., Y, are of the following structure: Y; = 
B + y(x; — T) + ei, where f and y are parameters and the e;’s are inde- 
pendent r.v.'s with mean 0 and unknown variance o?. 

(i) Sett = 7;-%,7 = 1,..., n, and observe thatthe model Y; = B+yt;+€; 
is of the standard form (1) with 6; = £, 62 = y, and the additional 
property that );_, ti = 0, or? = 0. 

(ii) Use expressions (5) and (8) to conclude that the LSE's of 6 and y are 
given by: 


(iii) Employ Theorem 4 in order to conclude that: 


A a? a? 
=NÍ|B— d P=N|y — 
B (a ~) and y (v Z) 


where (by (9)) SS; = »;_; ¢?. 

(iv) Determine the form of the confidence intervals for 6 and y from 
relations (22) and (23). 

(v) Determine the expression of the test statistics by means of relations 
(24) and (25). 

(vi) What do the confidence intervals in relation (29) and in Theorem 9 
(iii) become here? 


4.5 Consider the linear regression models: Y; = bı + Boxi + ei i = 1,...,m 
and Yj = Bi + B30; +e}, j= 1, ..., n, where the random errorse;,..., €m 
and e, ..., €, are iid. rv.’s distributed as N(0, de, 

(i) The fide pendence of e1, .. ., €m and ej, ..., e; implies independence 


of Yi, ..., Ymand Yf,..., y. Then write pat the joint likelihood of 


the Y;’s and the Y;’s and observe that the MLE's of £1, B2, Bj, Bz, and 


o”, in obvious notation, are given by: 


m} ii MY; — OS Y) 
md 1% =( i)” 


ĝi = Y - Bix" B= nos e E 1 Lia Y>) 
n} j= 1% y (Y) 





= Y — Box, po = 


> 
| 


? 





, 


= (SSg +58 7)/(m+ n), 
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where 
m 2 
= tos xy 
SSg = 20 = Bi — bexi? = SSy — =, 
m 1 m 2 
ss Yoa- (Em). 
j=l TONGA 
m 1 m 2 
SSy= Y Y?--— ) j 
2, MN 
m 1 m m 
SS yy = ) Yi- — (Es) ($ z), 
i=l m i=l i=1 
and 
n #2 
SS = X O} — Bi Pi = SS — <q 


j=l 


n ; 1 n 2 
sa ià Sa) , 
Gal NAGE 

n a 1 n 2 
E = * de 
20% -5 (2%): 
j= 


j=l 


n 1 n n 
SS% = LG = (20) (27) 
J= j= 


j=1 


(ii) In accordance with Theorem 4, observe that 


2 ~ N af 1 x? bn N g? 
Bl (41.0 (<+=)) Ba (4 =), 


2 


a* * 1 qa ak + I 
fi~ N (pio G5) An (e a), 
SSg +58 7 9 
oe N Amna 
Gii) From part (ii) and Theorem 5, conclude that 
Vm+n—4[(81 — Bi) — (81 — Bi)] i 
™ blntn—-4) 


(552 +853) (i+ i+ + &) 





and 














and 





m+n 4 [ (62 $3) (Ba B3)] ~t, 4 
m+n—4- 
(SS+ +58) (s $) 
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(iv) From part (iii), observe that the two regression lines can be compared 
through the test of the hypotheses Ho: 6; = ff against the alternative 
Ha: B1 # Bf, and Hj: 2 = p% against the alternative H}: Bo # Bz by 
using the respective test statistics: 


e /m+n-—4(B1 — BD 
[Esr +SS) (3 + + + =) 
f= J/m+n— 4 (By — Bs) 


7 (se + SS*) i= de a) 


At level of significance a, the hypothesis Hp is rejected when |t| > 
tm+n-4;2, and the hypothesis Hj is rejected when |t’| > tnyn—4;4- 

(v) Again from part (iii), observe that 95% confidence intervals for fB; — Bi 
and f2 — pš are given by: 


t 




















A by SSg + SS% (1 1 22 q 
(Br E BD) E ca | = E ( ) 











m+n-4 im a SS + so 
and 
A aa SSe +SS / 1 1 ) 
(Ba Ba) + tm | m+n— 4 SS, + SS* > 
respectively. 
(vi) Finally, from part (ii) conclude that a 95% confidence interval for o? 
is given by: 





? 


È +SS% SSp + | 


2 2 
Xmtn—4;% Xmtn—4;1-% 


| 13.5 Some Prediction Problems 


According to model (18), the expectation of the observation Y; at x; is EY; = 
B1+fB2x;. Now, suppose xy is a point distinct from all x;'s, but lying in the range 
that the x;’s span, and we wish to predict the expected value of the observation 
Yo at xo; i.e., EY, = B1 + 62%. An obvious predictor for E Yọ is the statistic Go 
given by the expression below and modified as indicated: 


Go = Bi + Boao = (Y — Ba) + Boxy = Y + (£o — Df. (26) 


The result below gives the distribution of Jo, which also provides for the 
construction of a confidence interval for 6, + 2%. 
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Under model (18) and with ĝo given by (26), we have: 


© Yo — (Bi + BX) ~ NO, 1). (27) 
Cat se 
(ii) Yo — (Pi + Baro) _ mee (28) 





-7% 
sy ar 
Gii) A 100(1 — w)% confidence interval for 6; + 2x0 is given by: 


m m= 1. @o- 2 
— tn-20/28,/ — + ==, 20/28, |= + ——- |. (29 
[oo ta-20/2 > + SS, Do + ln-2%0/2 aot SS, (29) 


It is recalled that S = ./SSz/(n— 2), SSz = SS,, and SS, and SS; 
are given in (9). 











PROOF (i) The assumption that Y, ~ N(B¡ + f2%;, o”), i = 1, ..., nindepen- 
dent implies that X`; Y, ~ N(nBi + B2 2; xi, no”) and hence 
Y ~ N(81 + Bot, 0° /n). (30) 
By Theorem 4(iii), $2 ~ N(Bo, 0?/SS,), so that 
A o? (xo — XY 
(#0 - Dh~ (aa EA eD 
SSy 


Furthermore, by Theorem 5(ii)(b), Y and f are independent. Then, rela- 
tions (26), (30), and (31) yield: 


Ñ A A (1 Qo- D? 
G0 = Ë +(x0— 2) ~ ns, + Bano, az + a) (32) 
n SSy 
and then (27) follows by standardization. 


(ii) By Theorem 5(ii)(c), SSg is independent of Y and $2 and hence independent 
of ĝo because of (26). Furthermore, by Theorem 5(), 


SSe (n-2)S? , 
> = 2 ~ Xn-2* 





(33) 


O O 


Therefore 





[Do — (B1 + Baxo)l/o / E + oe Yo — (Bi + PaXo) 


= ~ tn2, 
(n—2)S? 2 S 1 (a9—TP n—2 
ge / m- ) n + SS, 


which is relation (28). 








(iii) This part follows immediately from part (ii) and the standard procedure 
of setting up confidence intervals. A 


THEOREM 9 


13.5 Some Prediction Problems 391 


Finally, we would like to consider the problem of predicting a single re- 
sponse at a given point xy rather than its expected value. Call Yo the response 
corresponding to xy and, reasonably enough, assume that Y, is independent of 
the Y;'s. The predictor for Y, is ĝo, the same as the one given in (26). The ob- 
jective here is to construct a prediction interval for Yo. This is done indirectly 
in the following result. 





Under model (18), let Y, be the (unobserved) observation at xy, and 
assume that Yo is independent of the Y;'s. Predict Yo by Yo = Bi + B2Xo. 
Then: 











(i) do = Yo ~ NO, 1). (34) 
O /1 ll 1+ E 
(ii) AA ye (35) 


S Hf es ly ee 


Gii) A 100(1 — w)% prediction interval for Yo is given by: 


A 1  (x0—2xyY (xo — x)? 
= th2:0 Ide = ++ a Be 1 —= ll 
[5 tn—2; ns + a SS, ToT Up Bey) USP = =4 SS, 


where S and SS, are as in Theorem 8(ii), (iii). 














PROOF (i) We have: Yọ = bı + B2% + eo, predicted by % = $1 + fox. Then 
EYo = Bi + Box and EY = bı + 2x0, so that E(Y, — Yo) = 0. In deriving the 
distribution of ĝo — Yo, we need its variance. By (26), we have: 


Var(Y — Yo) = Var(Y + (xo — Bs — Yo) = Var(Y) + (xo — T)? Var(fs) + Var(Yo) 


(since all three r.v.'s, Y, $, and Yo, are independent) 





a? y? 
= + (ao — TF x +o” (by Theorem 2) 
n SS, 
(xp — ay f 
1 cd tion A PAT 
oli rome Te 
2 
E(G)—-Y%)=0 and Var(d—Y)=0 a+ al Pl 
ay 


Since ĝo and Y, are independent and Y, ~ N(f; + 62%, o”), then these facts 
along with (82) yield: 


(a — ay 
-V N(0, o i+ E 1). 


Relation (34) follows by standardizing ĝo — Yo. 
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(ii) It has been argued in the proof of Theorem 8(ii) that S and are inde- 
pendent. It follows that S and ĝo — Yo are also independent. Then, dividing the 
expression on the left-hand side in (34) by ,/ (n-as? /m- 2) = 8 in (83), we 
obtain the result in (85), after some simplifications. 

(iii) This part follows from part (ii) through the usual procedure of setting up 
confidence intervals. A 








5.1 Refer to Exercises 3.1, 3.3, 4.1, and: 
(i) Predict EY, at xy = 3.25, and construct a 95% confidence interval of 
EY. 
(ii) Predict the response Yo at xy = 3.25, and construct a 95% prediction 
interval for Yo. 


5.2 In reference to Example 22 in Chapter 1 (see also scatter diagram in 
Figure 13.1 and Examples 1, 2, 3, and 6 here), do the following: 
(i) Predict the E'Yo, where Yo is the response at xy = 3.25. 
Gi) Construct a 95% confidence interval for EY, = fi + fB2X%o = Bi + 
3.25 Bo. 
(iii) Predict the response Yo at xo = 2.5. 
(iv) Construct a 90% prediction interval for Yo. 


5.3 Inreference to Example 23 in Chapter 1 (see also Examples 4 and 7 here), 
do the following: 
(i) Predict the EY, where Yo is the response at xy = 12. 
Gi) Construct a 95% confidence interval for EY, = 61+ 62% = 1+ 1262. 
(iii) Predict the response Yo at xy = 12. 
(iv) Construct a 95% prediction interval for Yo. 


5.4 Refer to Example 5 and: 
(i) Predict the EY, at xo = 6. 
(ii) Construct a 95% confidence interval for EY, = 6; + 662. 
(iii) Predict the response Y, at x = 6. 
(iv) Construct a 95% prediction interval for Yo. 


5.5 Suppose that the data given in the table below follow model (18). 





x 5 10 15 20 25 30 
y | 0.10 | 0.21 | 0.30 | 0.35 | 0.44 | 0.62 
































(i) Determine the MLE's (LSE's) of 61, 62, and o?. 
(ii) Construct 95% confidence intervals for 61, 62, and o?. 
(iii) At xo = 17, predict both EY, and Yo (the respective observation at 
xo), and construct a 95% confidence interval and prediction interval, 
respectively, for them. 


Hint: Fora confidence interval for o”, see Exercise 3.5. 
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5.6 The following table gives the reciprocal temperatures x and the corre- 
sponding observed solubilities of a certain chemical substance, and as- 
sume that they follow model (18). 





x | 3.80 | 3.72 | 3.67 | 3.60 | 3.54 
1.27 | 120 | 1.10 | 0.82 | 0.65 
y | 1.32 1.26 | 107 | 0.84 | 0.57 
1.50 0.80 | 0.62 
































(i) Determine the MLE’s (LSE’s) of 6, b2, and o?. 
(ii) Construct 95% confidence intervals for 64, 62, and o?. 
Gii) At xo = 3.77, predict both EY, and Yo (the respective observation at 
xo), and construct a 95% confidence interval and prediction interval, 
respectively, for them. 


Note: Heren = 13 and Xx; = X2 = X3, % = X5, Le = X7, Xg = Ky = 
Lio, ANA Xy] = V12 = Xp. 


| 13.6 Proof of Theorem 5 


This section is solely devoted to justifying Theorem 5. Its proof is presented 
in considerable detail, and it makes use of some linear algebra results. The 
sources of those results are cited. 


PROOF OF THEOREM 5 For later use, let us set 


U; = Y, — bı — boxi, sothat O = Y — fı — Box, (36) 
and 
Ui- U =(%—Y)- foalx¡— 7) and Y;-— Y =(U;—U)+ Bala; — 2). 
(37) 
Then, by (10), 
P28S: = SSry = Yui DA — P), (38) 


so, that 
(Bo — B2)SS_ = Y (1; — DW: — Y) — PSS 


= Y (0 DIO; — Ü) + Boi — D] — P2SS, 
= > (1; - DU; — 0) + B28S, — B28S, 


= Vo =2(0; = 0). (39) 
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Next, 
SSp = Ane PY = eS — Bi — Boa)? 
= 2 -F+ pot — pax? (by (8)) 
= eS — Ë) — Bata; — HP 


= Y 10% - P) — falo; — 2) + Belo; — 2) — roi — HYP 





= J 10% Y) — Bali — D) — (B2— Bali — DY? 





= 2100: 0)- (ho PD (y BD) 
= } U; - UY + (Be — B? SSe — Êo — Ba) D (2 — NU; — D) 
= $ Us OF + Be — PY’ SSe — ZP — P? SSe (dy (39) 
= Y 0; - UP — Be — Pa? SSe 
= } UN — nb? — (Ba — Pay Sy i.e., 
SSg = 2 U? — nO? — (Ba — BASS». (40) 


From (18) and (36), we have that the r.v.'s U1, ..., Un are independent and 
distributed as N(0, o°). Transform them into the r.v.s Vi, ..., Va by means of 
an orthogonal transformation C as described below (see also Remark 4): 














AT aX En =Y 
VSS, JSS. JSS. 
c 1 1 1 
| Ya vn vn 


(whatever, subject to the res- 
triction that C is orthogonal) 


That is, with “” standing for transpose, we have: 
(Vi, Va, ...) Vay = CU, Us, ...3 Uny. (41) 


Then, by Theorem 8 in Chapter 8, the r.v.’s Vj, ..., Vn are independent and 
distributed as N(0, o7), whereas by relation (21) in the same chapter 


Ve) U}. (42) 
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From (41), 
1 1 1 _ 
=F i — HU, == 5 i= - i= . (4 
Vi yam e DU, Va wae aay a vnO. (43) 
But 


D: -DU = Y (DU - 1) = (o — Pa)SS dy 69), 


so that 
Vi =(B2 — PW SSe, Vi = (Bo — Bx)’ SSe, and Vi=n0". (44) 
Then, from relations (40), (42), and (44), it follows that 


n n 

SSe = XV- V-V; =) Vp. (45) 

i=l i=3 

We now proceed with the justifications of parts (i) and (ii) of the theorem. 

n 

(i) From (45), Sf = y (%)? ~ x29, since ¥, i = 1,..., n are independent 

and distributed as NO, 1). 

(ii) (a) From (44) and (45), $2 and SSz are functions of nonoverlapping V;'s 
(of V; the former, and of V3, ..., Vn the latter). Thus, SSz and f2 are 
independent. 

(b) By (36) and (43), Y = U + (61 + Bot) = % + Bi + Pad, so that Y 
is a function of V2 and recall that fə is a function of V;. Then the 
independence of Y and ĝ follows. 

(c) As was seen in (a) and (b), SSz is a function of V3,..., Va; Y isa 
function of Va; and ĝ is a function of Vj; i.e., they are functions of 
nonoverlapping V;'s, and therefore independent. 

(d) By (8), $1 = Y — Bai and the right-hand side is a function of V; and Va 
alone, by (44) and part (b). Since SSg is a function of V3, ..., Vn, by 
(45), the independence of SS, and $; follows. A 





REMARK 4 There is always an orthogonal matrix C with the first two 


rows as given above. Clearly, the vectors rı = (%1 — %,...,%, — XY and 
ra = (Fi ase Y are linearly independent. Then supplement them with 
n—2vectorsrs, ..., Fn, SO that the vectors r;,..., rn are linearly independent. 


Finally, use the Gram-Schmidt orthogonalization process (which leaves rı and 
rə intact) to arrive at an orthogonal matrix C. (See, e.g., Theorem 1.16 and 
the discussion following it, in pages 33-34, of the book Linear Algebra for 
Undergraduates (1957), John Wiley € Sons, by D. C. Murdoch.) 


| 13.7 Concluding Remarks 


In this chapter, we studied the simplest linear regression model, according to 
which the response Y at a point x is given by Y = fı + f2x + e. There are 
extensions of this model to different directions. First, the model may not be 
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linear in the parameters involved; i.e., the expectation y = EY is not linear. 
Here are some such examples. 


b 


1 
@n=ae", Gi) n =a; (iii)n = Ene (iv) y =4+by/z. 


It happens that these particular nonlinear models can be reduced to linear 
ones by suitable transformations. Thus, in (i), taking the logarithms (always 
with base e), we have: 


logn =loga+bx, or n =f, + fax”, 


where 7’ = logn, $; = loga, Bz = b, and x’ = x, and the new model is linear. 
Likewise, in (ii): 


logn = loga +blogx, or n =f, + Box’, 


where 7’ = logn, 61 = loga, $2 = b, and x’ = logx, and the transformed 
model is linear. In (iii), simply set n’ = ; to get n = a + bx, or y = bi + Box’, 
where $, = a, b2 = b and x’ = x. Finally, in (iv), let x’ = /x in order to get 
the linear model 7’ = 6 + fax”, with y = n, By = a, and fo = b. 

Another direction of a generalization is the consideration of the so-called 
multiple regression linear models. In such models, there is more than one 
input variable x and more than two parameters £; and b2. This simply reflects 
the fact that the response is influenced by more than one factor each time. For 
example, the observation may be the systolic blood pressure of the individual 
in a certain group, and the influencing factors may be weight and age. The 
general form of a multiple regression linear model is as follows: 


Y, = Xi 81 + XB +--+ + LpiBp + Gi, tlesih 


and the assumptions attached to it are similar to those used in model (18). The 
analysis of such a model can be done, in principle, along the same lines as those 
used in analyzing model (18). However, the analysis becomes unwieldy and 
one has to employ, most efficiently, linear algebra methodology. Such models 
are referred to as general linear models in the statistical literature, and they 
have proved very useful in a host of applications. The theoretical study of such 
models can be found, e.g., in Chapter 16 ofthe book A Course in Mathematical 
Statistics, 2nd edition (1997), Academic Press, by G. G. Roussas. 









Chapter 14 
y 


¿Y 


Two Models of Analysis 
of Variance 


This chapter is about statistical analysis of certain statistical modes referred 
to as Analysis of Variance (ANOVA). There is a great variety of such models, 
and their detailed study constitutes an interesting branch of statistics. What is 
done presently is to introduce two of the simplest models of ANOVA, underline 
the basic concepts involved, and proceed with the analysis of the proposed 
models. 

The first section is devoted to the study of the one-way layout ANOVA with 
the same number of observations for each combination of the factors involved 
(cells). The study consists in providing a motivation for the model, in deriving 
the MLE’s of its parameters, and in testing an important hypothesis. In the 
process of doing so, an explanation is provided for the term ANOVA. Also, 
several technical results necessary for the analysis are established. 

In the second section of the chapter, we construct confidence intervals 
for all so-called contrasts among the (mean) parameters of the model in 
Section 14.1. 

Section 14.3 is a generalization of the model studied in the first section, in 
that the outcome of an experiment is due to two factors. Again, a motivation 
is provided for the model, finally, adopted, and then its statistical analysis is 
discussed. This analysis consists in deriving the MLE's of the parameters of 
the model, and also in testing two hypotheses reflecting the actual influence, 
or lack thereof, of the factors involved in the outcome of the underlying exper- 
iment. Again, in the process of the analysis, an explanation is provided for the 
term ANOVA. Also, a substantial number of technical results are stated that 
are necessary for the analysis. Their proofs are deferred to a final subsection 
of this section in order not to disrupt the continuity of arguments. 

In all sections, relevant examples are discussed in detail in order to clarify 
the underlying ideas and apply the results obtained. 
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i 14.1 One-Way Layout with the Same Number of Observations per Cell 


In this section, we derive the MLE’s of the parameters m; i = 1,..., I and 
o? described in relation (13) of Chapter 8. Next, we consider the problem of 
testing the null hypothesis Ho: uy = --- = ur = u (unspecified) for which 
the MLE’s of the parameters u and o? are to be derived under Hy. Then we 
set up the likelihood ratio test, which turns out to be an F-test. For the justi- 
fication of this fact, we have to split sums of squares of variations in a certain 
way. Actually, it is this splitting from which the name ANOVA derives. Further- 
more, the splitting provides insight into what is happening behind the formal 
analysis. 





L|] 14.1.1 The MLE’s of the Parameters of the Model 


First, Yy ~ N(mi, o®),i=1,..., I, j= 1,..., J, and all these r.v.'s are inde- 
pendent. Then their likelihood function, to be denoted by L(y; pu, 07), is given 
by the expression below, where y = (4, ..., yj) and p = (mi, ---, HIX: 


1 1 
Lavine) =T]{ | 53 Wii mr] 


UN] 


1 1 
a 


1 J 1 
7 11 (>) e| -3 Wi - wo | 
1 IJ 1 
y (z=) I] [eso - DOR] 
1 IJ 1 
~ (a=) e| 7 552 >, Ls = wo 


following common practice, we do not explicitly indicate the range of ¿ and j, 
since no confusion is possible. Hence 


IJ IJ 1 
log L(y; m 0?) = —— logn) — logo” — 3) D wu- aa”. A) 
i j 











From (1), we see that, for each fixed o?, the log-likelihood is maximized 
with respect to u1, ..., 47, if the exponent 


Síur, ED = Y Y Uy — Y 
i j 
is minimized with respect to 1, ..., ur. By differentiation, we get 


0 1 
Fup ita oy HI) = 2 Yi +2Iui=0, sothatui=T> Yi, © 
í J J 
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and ES, ..., HI) = 2J. The resulting J x J diagonal matrix is positive 
definite, since, for all Ai, ..., Ay with A? +--- +42 > 0, 
Ay Ar 
2J 0 ne ie 
(4, a e) | = Jà, ..., 2441) 
Ar Ar 
=2J (M1 +--+ +A}) > 0. 
It follows that the values of j1;’s given in (2) are, indeed, the MLE’s of the 
ns. That is, 


> 


1 l 
i= Yi, Wherey=>5) Y ¿=L...,1 (3) 
7 


Now, in (1), replace the exponent by $ = yi Ly (Yi — yi)? to obtain in 
obvious notation 








A IJ IJ il, 
log L(y; ft, 0?) = —— log(2z) logo? &. (4) 
2 2 20? 
Differentiating with respect to o? and equating to 0, we get 
d i ae cl IJ §& 5 - 8 
go EUA ag t aa de or o = 7y 65) 
Since TF log L(y; fa, 0?) = 7 - En which evaluated at o? = JIJ 
gives: — E < 0. It follows that the value of o? given in (5) is, indeed, its MLE. 
That is, 
A 1 ; 
o? = 7S9 where SS =>) } Wy- YY. (6) 
i j 


The results recorded in (3) and (6) provide the answer to the first objective. 
That is, we have established the following result. 





THEOREM 1 
Consider the model described in relation (13) of Chapter 8; that is, Y;; = 
hi + eij Where the e;;'s are independently ~N(0, 0?) rv.’s, i = 1,..., 
I(= 2), 7 = 1,..., JE 2). Then the MLE’s of the parameters m; i = 
1,..., I ando? of the model are given by (3) and (6), respectively. 








L| 14.1.2 Testing the Hypothesis of Equality of Means 
Next, consider the problem of testing the null hypothesis 
Ho wy = --- = ur = u (unspecified). (7) 


Under Ab, the expression in (1) becomes: 


IJ IJ 1 
log L(y; u, 0°) = -> log@r)- logo? 3) 9 0107. ®© 
i j 





400 


Chapter 14 Two Models of Analysis of Variance 


Repeating a procedure similar to the one we went through above, we derive 
the MLES of y and o? under Hp, to be denoted by fi and o}, respectively; i.e., 


— 


1 1 
5 = ie Qe es 
Î = y., Where y, = TJ > > Yj, CH = Tyo 


i 


where SSr = Y X (uy — y.)”- (9) 
J 


We now proceed with the setting up of the likelihood ratio statistic à = A (y) 
in order to test the hypothesis Hp. To this end, first observe that, under Ho: 


1 9 IJ IJ 
== === a= = = SS = ==> Es 
va 27 > > (ij = Y.) | exp( 355; * r) exp( 2 ) 
whereas, under no restrictions imposed, 


1 IJ IJ 
— —=H ij — Gi. = = SS, = =—= Es 
a| a (w-y ) col 255, > ) exn( 2 ) 


i 








Therefore, after cancellations, the likelihood ratio statistic A is 
A= (02/07, ). 
Hence A < C, if and only if 


~ IJ/2 ~ T 
2 2 o 
= <0, or 2 <07", or Æ >1/0 =. (0) 
dí dí ae 

At this point, we need the following result. 


LEMMA 1 SSr = SS, + SSy, where SS, and SS7 are given by (6) and (9), 
respectively, and 


SSx= Y u- y =J} ui- y. (11) 
i j i 


PROOF Deferred to Subsection 14.1.3. 
According to this lemma, the last expression in (10) becomes: 


LE or o 0, or Dee 
SSe í SSe j SSe 
In other words, the likelihood ratio test rejects Hj) whenever 
SSH 
SSe 
In order to determine the cutoff point Cı in (12), we have to have the 
distribution, under Ap, of the statistic SSg /SS,, where it is tacitly assumed 
that the observed values have been replaced by the respective r.v.’s. For this 
purpose, we need the following result. 








> Ci=C0-—1. 





> C¡, where SS, and SSy are given by (6) and (11), respectively. (12) 


LEMMA 2 Consider the model described in Theorem 1, and in the expres- 
sions SSe, SST, and SS, defined by (6), (9), and (11), respectively, replace 


THEOREM 2 
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the observed values y;;, Yi, and y. by the rv.’s Yij, Yi, and Y_, respectively, 
but retain the same notation. Then: 


(i) The rv. SS,/0? is distributed as Xi J-1) 
(ii) The statistics SS, and SSy are independent. 
Furthermore, if the null hypothesis Hp defined in (7) is true, then: 
(iii) The r.v. SSg/0? is distributed as x? 4. 
(iv) The statistic STD ma Fr 1,1(J—1)- 
(v) The r.v. SS7/0? is distributed as x?,_ |. 


PROOF Deferred to Subsection 14.1.3. 
To this lemma, there is the following corollary, which also encompasses the 


[1;’s. 
COROLLARY 
(i) The MLES fi; = Y, are unbiased estimates of m;, i = 1,..., I. 


(ii) The MLE o? = SS,/IJ is biased, but the estimate MS, = SS,/I(J — 1) is 
unbiased. 


PROOF (i) Immediate; (ii) Follow from Lemma 2(i). A 


We may conclude that, on the basis of (12) and Lemma 2(iv), in order to 
test the hypothesis stated in (7), at level of significance a, we reject the null 
hypothesis Ho whenever 

_SSh/U —1) MSp 
~ §S./I(J-1) MS, 
So, the following result has been established. 





> Frage: (13) 





In reference to the model described in Theorem 1, the null hypothesis 
Ho defined in (7) is rejected whenever the inequality in (13) holds; the 
quantities SS, and SSy are given in relations (6) and (11), respectively, 
and they can be computed by using the formulas in (14) below. 








REMARK 1 At this point, it should be recalled that the point Fm; is de- 
termined, so that P(X > Fmn;a) = 4, Where X is a r.v. distributed as Finn; see 
Figure 14.1. 


REMARK 2 By rewriting analytically the relation in Lemma 1, we have that: 
YY wa -— uP Y ij uP tA Y — Y. 
i j i j i j 


That is, the total variation of the y;j’s with respect to the grand mean y. 
is split into two parts: the variation of the y;¿'s in each ith group with respect 
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Figure 14.1 


The Graph of the 
p.d.f. of the Fin 
Distribution Along 
with the Rejection 
Region of Ho and 
the Level of 
Significance 























to their mean y, in that group (variation within groups), and the variation 
of the J means y;,,7 = 1,..., I from the grand mean y. (variation between 
groups). By changing the term “variation” to “variance,” we are led to the term 
ANOVA. The expression SS; is also referred to as the error sum of squares for 
obvious reasons, and the expression SSj7 is also referred to as the treatment 
sum of squares, since it reflects variations due to treatment differences. The 
subscript H accounts for the fact that this statistic is instrumental in testing 
Hp; see also Lemma 3 below. Finally, the expression SS7 is called the total sum 
of squares, again for obvious reasons. 

The various quantities employed in carrying out the test described in (13) 
are usually gathered together in the form of atable, an ANOVA table, asis done 
in Table 14.1. 


REMARK 3 For computational purposes, we have: 
SSy=JY Yi-1JY?, S=} X} Yi -JY Yi. a9 
i j i 


i 





Indeed, 
SSa =J} Y-Y =J} Yi +13Y?-23Y, ) Yi 
i i i 
=J Y Vit IY? —2JY IY, =3) Y? — IJY?. 
a i 
Source of Degrees of 
Variance Sums of Squares Freedom Mean Squares 
3 SS, 
Between groups SSH = J Y (Y; — Y Y F=1 MSy = 4 
i=l 
I J . oe 
Within groups SS. = Y } Yy- Yi? I(J —1) MS. = TD 
i=1 j=l 
I J 
Total SSr= Y $} Yy- IJ-1 — 
i=l j=1 











14.1 One-Way Layout with the Same Number of Observations per Cell 403 


Also, 
S=) O Yy-Y => Yi + ID Yi 2) Y YY iz, 
: a i J i : 3 
and 
NE pn Y =(Drjum=I DA 
ey i 7 i i 


so that the result follows. 


For a numerical example, take J = 3, J = 5, and let: 
Yu =82 ya = 61 Ya1 = 78 
Y12 = 83 Yo2 = 62 Y32 = 72 
Yis = 75 Yaz = 67 Y33 = 74 
Y = 79 Yas = 65 Ysa = 75 
Yı5 = 78 Yæ =64 Ys5 = 72. 

(i) Compute the MLE of m;, i = 1, 2, 3. 
(ii) Compute the sum of squares SSy and SS,, and also the unbiased estimate 
MS, of o°. 


(iii) Test the hypothesis Ao: uı = u2 = u3 = p atlevel of significance a = 0.05. 
(iv) Compute the MLE of n. 


DISCUSSION 


(i) 41 = 79.4, fig = 63.8, fig = 74.2. 
(ii) Since y; = (1;, i = 1, 2,3 and y. ~ 72.467, we get, by (14): 


SS ~ 79,402.2 — 78,771.991 = 630.209, 
88.8 


SS, = 79,491 — 79,402.2 = 88.8, and MS, = ÓN 7.4. 


Since MSp ~ 315.105, the test statistic is: 221% ~ 42.582. On the other 
hand, F> 12:0.05 = 3.8853, the hypothesis Ho is rejected. 
(iii) Finally, & = y = 72.467. 


Here is another example with data from a real experiment. 


In an effort to improve the quality of recording tapes, the effects of four kinds 
of coatings A, B, C, D on the reproducing quality of sound are compared. 
Suppose that the measurements of sound distortion given in Table 14.2 are 
obtained from tapes treated with the four coatings. Look at this problem as a 
one-way layout ANOVA and carry out the analysis; take as level of significance 
a = 0.05. 


DISCUSSION Here I = 4, J = 4. For the MLE’s of the means, we have: 
fa = y = 11.25, fio = yo, = 17.00, ûs = ys, = 15.50, fig = Ys, = 14.75. Also, 
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y. = 14.625. Next, by (14): SSy = 4 x 873.375 — 16 x 213.890625 = 3,493.50 — 
3,422.25 = 71.25, SS, = 3,568 — 4 x 873.375 = 3,568 — 3,493.5 = 74.50, so 
that MSy = 23.75, MS, ~ 6.208. The observed value of the test statistics is: 
a5 ~ 3.826, whereas F3 12:0.05 = 3.4903. Therefore the null hypothesis about 
equality of the means is rejected. Finally, A = y. = 14.625. 








Grand 
Coating Observations Mean Mean 
A 10, 15, 8,12 yi, = 11.25 y. = 14.625 
B 14, 18, 21, 15 ya. = 17.00 
C 17, 16, 14, 15 ya. = 15.50 
D 12, 15, 17, 15 Ya. = 14.75 








The following observations are meant to shed more light on the F test 
which is used for testing the null hypothesis Ho. Recall that F = E where, 
by (11) and (13), 


J 2 
MSs == 2 =¥) 


and that Su ~ x3_1 under Ap, so that EMS y = o°. It will be shown below 
that, regardless whether Ap is true or not, 


J 1 
EMSy = 0° + 7-1 2 Ci — u), where pu, = 7 2 li. (15) 


Therefore EMSy > o? = E(MSg | Hp) and EMS, = o? under Hp; also, 
EMS, = o?. Thus, on the basis of this average criterion, it makes sense to 
reject Ho when MS, measured against MS., takes large values. For reference 
purposes, relation (15) is stated below as a lemma. 


LEMMA 3 It holds that: 
1 J 
EMS = — E Y, -YY = — EN (Y, - Y Y 
n= 22 AG a =Y) 
J 
=0°+ Ti Sui uy. 
į 


PROOF Deferred to Subsection 14.1.3. 


L. 14.1.3 Proof of Lemmas in Section 14.1 


We now proceed with the justification of Lemmas 1-3 in this section. 


PROOF OF LEMMA 1 We have: 
SSr = Y Y (Yi — u =D Y uy- Y) + (9. — y) 
i j i j 


=) Vw n+) Y um- u +20 Y (ij — YY. — u) 
i j i j i j 
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= SS, + SSy, since 
Y Y ig — wm — w= Y ui- Y) Y (ij — yi) 
i J i j 


= Y (YY —Jyi)=0. A 


PROOF OF LEMMA 2 At this point, recall that, if X;,..., X, are independent 
r.v.’s distributed as N(u, o°), then: (a)X and »,(X; —X are independent; 
(b) 4 ¡(%; — X? ~ x2_,. Apply these results as follows: 


@ For eachi=1,...,1, 5 ¡(Lig — Yi)? ~ x3-1 by (b) above. Further- 
more, fori’ 4 i, Y (Yj — Yr)? and X ¡(Y ¡¡— Yi.) are independent, since 
they are defined (separately) on sets of independent r.v.'s. It follows that: 


SS, 1 1 
=a A u- KP = SD YY ~ xia 
i J t 3 





o2 


(ii) For each i = 1,..., I, the independent r.v's Yi, ..., Yiz ~ N(ui 07). 
Hence, by (a) above, >> ¡Yi -Y, ¡Y and Y, are independent. Furthermore, 
y jVy Y i)? and Yy, are also independent for i’ 4 i, because Yy, is de- 
fined on a set of r.v.'s which are independent of »” ¡(Y ii Y; Y. Thus, each 
of the statistics Y” ¡Yi VY i}, i= 1, ..., Tisindependent of the statistics 
Y,,..., Yr, and the statistics 2y- Y;)?, i= 1, ..., I are independent, 
as was seen in part (i). It follows that the sets Vi Vy —¥,;),i=1,...,1 
and Y;,7 = 1,..., I are independent. Then so are functions defined 
(separately) on them. In particular, the functions >,» ¡(Yi — Y, ¿Y and 
Ni Fees are independent, or SS, and SSy are independent. 

(iii) Under Hp, the rv.’s Y,,..., Y, are independent and distributed as 
N(u, 0?/ J). Therefore 4 ¡(Y — Y.P ~ x}. Since 4 (Y, — YP = 
az adj Y- YP = ¿5Sp, the result follows. 

(iv) It follows from parts (i)-(iii) and the definition of the F distribution. 

(v) Under Ho, the rv.’s Yọ, i = 1,...,1, j = 1,..., J are independent and 
distributed as N(u, 0”). Then, by (b) above, 4 Zay- Y Y is dis- 
tributed as x?,_,,or Sf ~ xi. A 


PROOF OF LEMMA 3 Before taking expectations, work with >,(Y, — Y Y 
and rewrite it in a convenient form; namely, 


Y -YP = dim —pw)-W-w)P= A UY -İIE —uY, 
because 


YH -a= DN. = 5 Yu- Iu 
i i j 


i 


i 


= FED Yy- In. = IY. — Iu, = IY. - 1), 
j 
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so that 
2) 0 PY. -— u) = 2Y. — n) x IŒ. - u) =-21(Y. — 1. Y. 








So, 
E Y, -YY =) EY, -pY -IEY -u =D EY; -u$ -I Var.) 


2 
=) EM -u -13 = EY. - o 


and 


ER. — w= ELY; — u) + (ui — WP = EY; — uD? + (14 — 1 Y 


2 
= Var(¥i) + (12 = © + Gu Y. 





J 
Therefore 
2 pe .1=1 9 2 
a) = Y? = + Ds py ===, 


and, by (11) and (13), 


EMS = 0? +2 Pou-n), whichis 15). A 


REMARK 4 In closing this section, it should be pointed out that an obvious 
generalization of what was done here is to have different J’s for each i = 
1,..., I; i.e., for each i, we have J; observations. The analysis conceptually 
remains the same, only one would have to carry along the J;’s as oppose to 
one J. 


1.1 Apply the one-way layout analysis of variance to the data given in the table 
below. Take a = 0.05. 





A B C 
10.0 | 9.1 | 9.2 
115 | 103 | 84 
117 | 94 | 9.4 


























1.2 Consider the log-likelihood function (1) as it becomes under the null hy- 
pothesis A, stated in (7), and show that the MLE’s of u and o? are given 
by the expression in relation (9). 


1.3 In reference to the derivation of the likelihood ratio test à for testing the 
hypothesis Ho stated in relation (7), show that A is, actually, given by any 
one of the expressions in relation (10). 
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1.4 In reference to the proof of Lemma 3, show that: 
O EY. -uF = 

Gi) EY: — mi)? = 

(iii) BY, — uY = EY; — wi)? + (ui — wD”. 


14.2 A Multicomparison Method 


Refer to the one-way layout model discussed in the previous section, namely, 
to the model described in Theorem 1. One of the problems we have studied 
was that of testing the null hypothesis about equality of the J means; i.e., 


JLL 


Ho wy = --- = ur = u unspecified. (16) 


Suppose now that the hypothesis Ho is rejected, as, indeed, was the case 
in Examples 1 and 2. Rejection of Hp simply means that not all of the ju;’s are 
equal. Clearly, it would be desirable to know which of the u;’s are responsible 
for the rejection of Hp. It is true that we do gain some information about it 
by looking at the estimates j1;. However, we would like to obtain additional 
information analogous to that provided by a confidence interval for a real- 
valued parameter. This is the problem to examine in this section. 

In relation (16), the hypothesis Hy) compares the parameters involved and, 
actually, stipulates that they are all equal. This suggests that any attempt to 
construct a confidence interval should not focus on a single parameter, but 
rather on two or more parameters simultaneously. For example, we would like 
to compare all possible pairs (j1;, uj) through the differences u; — u j. Or, more 
generally, to compare one subset of these parameters against the complement 
of this subset. Thus, in Example 1, where we have three parameters u1, ua, and 
13, we may wish, e.g., to compare u; against (u2, 113), OF u2 against (111, u3), Or 
us against (111, u2). One way of doing it is to look at the respective differences: 
pı — ¿(u1 + u2), He — 311 + us), Ma — (m1 + mo). 

At this point, itis to be observed that all expressions we looked at above 
are of the form c¡ 11 +---+ cy; with ci +--- +7 = 0. This observation leads 
to the following definition. 


DEFINITION 1 

In reference to the model described in Theorem 1, any relation among 
the parameters j11,..., 47 of the form Y = o Cii with o ci = Qis 
called a contrast among the ,1;’s. 


It follows from the above discussion that what would be really meaningful 
here would be the construction of confidence intervals for contrasts among 
the ;'s. In particular, it would be clearly, highly desirable to construct confi- 
dence intervals for all possible contrast among the u;’s, which would all have 
the same confidence coefficient. This is exactly the content of the theorem 
stated below. 
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First, let us introduce some pieces of notation needed. To this end, consider 


the contrast 
v= Dau (Da =0) a7) 
i i 
and let us estimate Y by Y, where 
v= X chs E Y aN. (18) 
i i 
Clearly, 
7 5 1 
Ey=vw and Var(í)= (; pa) (19) 
and the variance is estimated, in an obvious mamner, by 
do 1 a 
Var(v) = 7 = c; |MS., MS. = SS./I(J — 1), (20) 
i 


and SS, is given in (6) (see also (14)). Finally, define S? by: 
S? = (I — DF 110-030. (21) 


Then we have the following important result. 





With the notation introduced in (17), (18), (20), and (21), the interval 


(è — Sy Var), Ù+ Sy Tarw) (22) 


is a confidence interval with confidence coefficient 1 — œ simultaneously 
for all contrasts Y. 











At this point, it should not come as a surprise that there is an intimate 
relationship between the null hypothesis Ho and confidence intervals for con- 
trast. The result stated below as a lemma (but not proved!) articulates this 
relationship. In its statement, we need a concept defined now. 


DEFINITION 2 

Let Y and Y be as in (17) and (18), respectively. Then we say that Y is 
significantly different from zero, if the interval defined in (22) does not 
contain zero; equivalently, |] > Sy Var(Y). 


Then the lemma mentioned above is as follows. 


LEMMA 4 The null hypothesis Hp stated in (16) is rejected, if and only if 
there is at least one contrast Y for which Y is significantly different from zero. 

We do not intend to pursue the proof of Theorem 3 here, which can be 
found in great detail in Section 17.4 of the book A Course in Mathematical 
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Statistics, 2nd edition, Academic Press (1997), by G. G. Roussas. Suffice it to 
say that it follows from the maximization, with respect to c1, ..., Cr, subject 
to the contrast constraint >”, c; = 0, of the function 


1 
F(C1,...,C1)= Ss oh, — Hi), 
Ji i 


and that this maximization is obtained by means of the so-called Lagrange 
multipliers. In the process of doing so, we also need the following two facts. 


LEMMA 5 
(i) With u, = $), m, the rv.’s Lyly- u) (A — uP, i= 1, ..., I are 
independent. 


Gi) The r.v.'s X} [Y — Y.) — (ui — u.)]? and SS, are independent. 
(iii) Under the null hypothesis Ho, 5 XŒ: — Y.) — (mi — u~ Joa 
PROOF Deferred to the end of the section. 


We now consider some examples. 


In reference to Example 1, construct a 95% confidence interval for each of the 
following contrasts: 


1 
Ui = H2, M1—=HM3, H2—H3, bi 3 (He + u3), 
1 1 
u2 — ¿(Us + mı), M3 ¿Un + u2). 


DISCUSSION Here I = 3, J = 5, and hence Fr_1,1(J-1)34 = F2,12;0.05 = 
3.8853, S? = (I — DF1-1,1(4-0);0 = 2 x 3.8858 = 7.7706 and S ~ 2.788. Also, 
MS, = 7.4 from Example 1. From the same example, for Y = u1 — u2, we have 
Y = Y, — Ya, = 79.4 — 63.8 = 15.6, Var(h) = 2 x 7.4 = 2.96, v Var(Î) = 1.72 
and Sy Var(Î) = 2.788 x 1.72 ~ 4.795. Therefore the required (observed) 
confidence interval for jz; — ua is: 


[15.6 — 4.795, 15.6 + 4.795] = [10.805, 20.395]. 
Likewise, for Y = 1; — uz, we have Y = Y, —¥3 = 79.4—74.2 = 5.2, Var(W) = 


2.96 the same as before, and hence Sy Var(V) ~ 4.795. Then the required 
(observed) confidence interval for uu; — uz is: 


[5.2— 4.795, 5.2 + 4.795] = [0.405, 9.995]. 


Also, for Y = ua — uz, we have Y = Y, — Ya, = 63.8 — 74.2 = —10.4. Since the 
Var(W) is still 2.96, the required (observed) confidence interval for u2 — uz is: 


[-10.4 — 4.795, —10.4 + 4.795] = [-15.195, —5.605). 


Next, let Y = u1 — ¿(ua + u3), so that Y = 79.4 — $(63.8+ 74.2) = 79.4— 69 = 
10.4, Var(h) = 2 x 7.4 = 2.22, VVar( Y) ~ 1.49 and SVVar(h) ~ 4.154. 
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Therefore the required (observed) confidence interval is: 
[10.4 — 4.154, 10.4 + 4.154] =[6.246, 14.554]. 


For Y = m — ¿(u3 + mı), we have Y = 63.8 — $(74.2 + 79.4) = —13, and 
therefore the required (observed) confidence interval is: 


[-13 — 4.154, —1344.154] = [-17.154, —8.846]. 


Finally, for © = uz — ¿(u1 + u2), we have Y = 74.2 — 5(79.4 + 63.8) = 2.6, 
and the required (observed) confidence interval is: 


[2.6 — 4.154, 2.644.154] = [-1.554, 6.754]. 


It is noteworthy that of the six contrasts we have entertained in this example, 
for only one contrast Y = uz — Za + u2) the respective quantity Y = 2.6, 
is not significantly different from zero. This is consonant with Lemma 4, since 
we already know (from Example 1) that Hp is rejected. For example, for the 
contrast Y = u1 — u2, we found the confidence interval (10.805, 20.395), which 
does not contain 0. This simply says that, at the confidence level considered, 
Hı and ua cannot be equal; thus, Ho would have to be rejected. Likewise for 
the contrasts mı — u3 and ua — ua. 


In reference to Example 2, construct a 95% confidence interval for each of the 
following contrasts: 


Hı = H2, Mi= M3, Mi= M4 2-63, pHM2=HMa4 3 — Ma. 


DISCUSSION Herel = J = 4, Fi_ 1I1(J-D);¡a = = Fr 12;0.05 = = 3.4903, S? = 
Q = DF, LI(J-l);a = = 3 x 3.4903 = 10. 4709, and S = 3,2 236. For Y = Hi — H2, 
we have Y = Y; — Ya = 11.25 — 17 = -5.75, and Var() = = 0.5 x 6.208 = 


3.104, VVar(L) ~ 1.762. Thus, SV Var(Y) = 3.236 x 1.762 ~ 5.702. Then the 
required (observed) confidence interval for uy — ua is: 


[-5.75— 5.702, —5.75 + 5.702] = [-11.452, — 0.048]. 


For Y = u — us, we have Y = 11.25 — 15.50 = —4.25, and the required 
(observed) confidence interval for uy — uz is: 


[-4.25 — 5.702, —4.25 + 5.702] = [-9.952, 1.452]. 


For Y = u — pa, we have Y = 11.25 — 14.75 = —3.5, and the required 
(observed) confidence interval is: 


[-3.5 — 5.702, —3.5 + 5.702] = [-9.202, 2.202]. 


For Y = jz — ug, we have Y = 17 — 15.5 = 1.5, and the required (observed) 
confidence interval is: 


[1.5 — 5.702, 1.5+5.702] =[-4.202, 7.202]. 


For Y = uz — u4, we have Y = 17 — 14.75 = 2.25, and the required (observed) 
confidence interval is: 


[2.25 — 5.702, 2.25 + 5.702] = [-3.452, 7.952]. 
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Finally, for Y = u3 — u4, we have Y = 15.50 — 14.75 = 0.75, and the required 
(observed) confidence interval is: 


[0.75 — 5.702, 0.75 + 5.702] = [—4.952, 6.452]. 


In this example, we have that for only one of the six contrasts considered, 
Y = u — pz, the respective Y = —5.75, is significantly different from zero. 
This, of course, suffices for the rejection of the hypothesis Hp (according to 
Lemma 4), as it actually happened in Lemma 2. So, it appears that the means 
1 and ua are the culprit here. This fact is also reflected by the estimates of 
the ¿'s found in Example 2; namely, 


fy = 11.25, fi2=17.00, fis = 15.50, fg = 14.75; 


(i; and (12 are the furthest apart. 
This section is concluded with the presentation of a justification of 
Lemma 5. 


PROOF OF LEMMA 5 
(i) Here 


SY iW = wa) (E — uD? = OW YY, 
j j 
and the statistics }` jy - Y;,)’,i =1,..., I are independent, since they 
are defined (separately) on independent sets (rows) of r.v.'s. 
(ii) The proof of this part is reminiscent of that of Lemma 2(ii). The inde- 
pendent r.v.s Y; — Mi, ..., Yiz — u; are distributed as N(0, 07). Since 
4 y ¡(Y ij — Hi) = Y; — m, it follows, by an application of (a) in the proof 
of Lemma 2, that, for each i = 1,..., I, [Wy — m) — Yi. — ui)? = 
Ly- Y; * is independent of Y; — mi. For Y 4 i, each of Y ¡Yi Y, y 
is also independent of Y; — uy. Also, by part (i), 2 y= YY i=1,...,I 
are independent. It follows that the sets of statistics 


X Ey YY, i=1,...,I and Yep, i=1,...,I 
j 


are independent. Then so are functions defined (separately) on them. In 
particular, this is true for the functions 


Ni Ty - Yi) = SS and 
E, — wa) — (E. — HWP = IG — Y) — Ga — HP. 


(iii) For i = 1,..., I, the rv.’s Y; — u;i are independent and distributed as 
N(0, o?/J), so that the independent r.v.'s Y (Y, — ui), i = 1,..., I are 
distributed as N(0, 1). 
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Since DD (Y, — Mi) = Y, — u), it follows (by (a) in the proof of 
Lemma 2) that 





2 
>| Ze ae E Al “xy 


3 o O 
i 


or 


J 
= IA Y) (mul 4 





2.1 Refer to Exercise 1.1, and construct 95% confidence intervals for all con- 
trasts of the ju’s. 


2.2 Refer to Exercise 1.2, and construct 95% confidence intervals for all con- 
trasts of the y's. 





i 14.3 Two-Way Layout with One Observation per Cell 


In this section, we pursue the study of the kind of problems considered in 
Section 14.1, but in a more general framework. Specifically, we consider ex- 
periments whose outcomes are influenced by more than one factor. In the 
model to be analyzed here there will be two such factors, one factor occurring 
at I levels and the other factor occurring at J levels. The following example 
will help clarify the underlying ideas and the issues to be resolved. 


L EXAMPLES | Suppose we are interested in acquiring a fairly large number of equipments 


from among I brands entertained. The available workforce to use the equip- 
ments bought consists of J workers. Before a purchase decision is made, an 
experiment is carried out whereby each one of the J workers uses each one 
of the J equipments for one day. It is assumed that the one day’s production 
would be a quantity, denoted by nij, depending on the ith brand of equipment 
and the jth worker, except for an error e;; associated with the ¿th equipment 
and the jth worker. Thus, the one day’s outcome is, actually, an observed value 
of a r.v. Yj;, which has the following structure: Yi; = mij + ey,t = 1,..., I, 
j=1,..., J. For the errors e;; the familiar assumptions are made; namely, the 
rv.’seij,i=1,..., 1,7 =1,..., J are independent and distributed as N(0, o°). 
It follows that the rv.’s Y;;, i = 1,..., I, j = 1,..., J are independent with 
Yi; ~ N(p is, o°). At this point, the further reasonable assumption is made that 
each mean u;i; consists of three additive parts: a quantity u, the grand mean, 
the same for all ¿ and j; an effect due the ith equipment, denoted by a; and 
usually referred to as the row effect; and an effect due to the jth worker, de- 
noted by $; and usually referred to as the column effect. So, pij =u+0+Bj. 
Now, it is not unreasonable to assume that some of the a; effects are positive, 
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some are negative, and on the whole their sum is zero. Likewise for the £; 
effects. 

Gathering together the assumptions made so far, we have the following 
model. 


Yy= u + ai + B; + ey, Di a; = 0 and ei B; = 0, the rv.’s 
éy,t=1,..., IÈ 2), j=1,..., JR 2) are independent and distributed 
as N(0, o°). 
It follows that the rv.’s Yy,,i=1,...,1, 7=1,...,7 
are independent with Y;; ~ N(u + a; + Bj, o°). 
(23) 


Of course, once model (23) is arrived at, it can be detached from the specific 
example which helped motivate the model. 

In reference to model (23), the questions which arise naturally are the 
following: What are the magnitudes of the grand mean yu, of the row effects a;, 
of the column effects fj, and of the error variance 07? Also, are there, really, any 
row effects present (does it make a difference, for the output, which equipment 
is purchased)? Likewise for the column effects. In statistical terminology, the 
questions posed above translate as follows: Estimate the parameters of the 
model ų, œi, i = 1, ..., I, Bj, j =1,..., J, ando”. The estimates sought will be 
the MLE’s, which for the parameters u, œ;, and £; are also LSE’s. Test the null 
hypothesis of no row effects Ho, 4: a] = --- = a; (and therefore = 0). Test the 
null hypothesis of no column effects Ao, g: Bj = --- = By (and therefore = 0). 


L. 14.3.1 The MLE’s of the Parameters of the Model 


The likelihood function of the Y;;’s, to be denoted by L(y; n, a, B, 0?) in obvi- 
ous notation, is given by the formula below. In this formula and in the sequel, 
the precise range of i and j will not be indicated explicitly for notational 
convenience. 





IJ 
L(y; n, a, B, 0?) = (==) | E e| 
(24) 


For each fixed o?, maximization of the likelihood function with respect to 
L, o, and By is equivalent to minimization, with respect to these parameters, 
of the expression: 


S(u, 1, <- €n Br, ---, Bs) = S(u, a, B)= Y X (y — 10 — Bi)’. (25) 
i j 

Minimization of S(u, a, 8) with respect to u, a, and 8 yields the values given 

in the following result. 


LEMMA 6 The unique minimizing values of u, œ;, and £; for expression (25) 
(i.e., the LSE’s of u, a;, and £;) are given by: 


A=yYy, Oi = Yi. — Yo t= A E Êj = Yj — Y. J=1,...,4d, (26) 
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where 
1 1 1 
Yi. => 2 Yip YG=>5 2 Ys Y= Ty a 2 Yij- (27) 


PROOF Deferred to Subsection 14.3.3. 
For the values in (26), the log-likelihood function becomes, with obvious no- 
tation: 





wae T To. tx 
log L(y; ll, â, P, o“) = ——- log(2r) logo S, (28) 
2 2 20? 


where $ = OR ¡Uy — Yi. — Yj + y y. Relation (28) is of exactly the same 
type as relation (5), maximization of which produced the value 


La 4 
a= 8 = 172206 m= utu. (29) 


Combining then the results in (26) and (29), we have the following result. A 





THEOREM 4 


Under model (23), the MLE’s of the parameters of the model are given by 
relations (26) and (29). Furthermore, the MLE’s of jz, w;, and £; are also 
their LSE's. 





L. 14.3.2 Testing the Hypothesis of No Row or No Column Effects 
First, consider the null hypothesis of no row effects; namely, 
Hp, a: aj=::-=ar,=0. (30) 


Under Ah, a, the likelihood function in (24), to be denoted for convenience by 
LAY; HM, B, o°), becomes: 


i. \" 1 
Lays u, B, o°) = ( ==) epl -zh 2200 ER s| (31) 


Maximization of this likelihood with respect to £;’s and p, for each fixed o?, 
is equivalent to minimization, with respect to 6;’s and u of the expression: 





S(u, Br, BN = Su, B)= D> (ij — 1 BY. (32) 
i j 
Working exactly as in (25), we obtain the following MLE's, under Hp, 4, to be 
denoted by fia and Ê; 4: 
PRE Y= hy. Êza= yj- UH Pie j=l... (33) 


Then, repeating the steps in relation (28), we obtain the MLE of o°, under Hp, a: 


ay. 
C4 = Tz 2o DLW — 4a) (34) 
tod 


14.3 Two-Way Layout with One Observation per Cell 415 


The hypothesis Ho, 4 will be tested by means of the likelihood ratio test. First, 
observe that: 


IJ < IJ 
eanl- E X0- 1? | =e- B sa) <00(-) 


and 


1 IJ > IJ 
exp| ——= (Yi; — Yi. — Yj +y. | =exp| ——= x o? = exp(-) 
| | em z 


Then the likelihood ratio statistic À is given by: 


1J/2 
= (07/03) 
IJ/2 = 
a? a? 
Hence (=) <C, ifand only if < > Co = 1/0”. (35) 
CA oO 


At this point, use the following notation: 
SS =I =Y wy-u—uityy, SS1=14) =J u-u, 
i j i i 
(36) 


by means of which it is shown that: 


LEMMA 7 With o”, SS, and SS4 defined by (34) and (36), it holds: IJa? = 
IJo? + SS, = SS, + SSp. 


PROOF Deferred to Subsection 14.3.3. 
By means of this lemma, relation (85) becomes: 











=) =) 
o4 Wo, SS.+SS4 SSA SSA 
o2  IJo? SSe + SSe te a SS. ome 
So, the likelihood ratio test rejects Hp, 4 whenever 
SSA ; . 
3S. > Cı, where SS, and SS, are given in (36). (37) 
e 


For the determination of the cutoff point C in (37), we need the distribu- 
tion of the statistic SS4/SS, under Ap, a, where it is tacitly assumed that the 
observed values have been replaced by the respective r.v.'s. For this purpose, 
we establish the following result. 


LEMMA 8 Consider the expressions SS, and SS, defined in (36), and replace 
the observed values Y;;, Yi, yj, and y. by the respective r.v.'s Yi;, Yi, Yj, and 
Y , but retain the same notation. Then, under model (23): 


(i) The rv. SS,/0? is distributed as x7 1x1) 
(ii) The statistics SS, and SS, are independent. 
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Furthermore, if the null hypothesis A), 4 defined in (30) is true, then: 


(iii) The r.v. SS4/0? is distributed as x?_,. 
. oe SSa/U-1) 
(iv) The statistic sJ- DOD Fraja- D(J-1D-* 
PROOF Deferred to Subsection 14.3.3. 
To this lemma, there is the following corollary, which also encompasses the 
estimates fi, ĉi, and bj. 


COROLLARY 

G) The MLE’ fi = Y.,6; = Y, — Y, i=1,...,L and ĝj = Yj- Y, j = 
1,..., J are unbiased estimates of the respective parameters p, aj, 
and B je 


(ii) The MLE a? = SS,/IJ of o? given by (29) and (36) is biased, but the 
estimate MS, = SS. /(1 — 1)(J — 1) is unbiased. 


PROOF 


(i) It is immediate from the definition of Y, , Y j, and Y. as (sample) means. 
(ii) From the lemma, $% ~ XG pa-r 50 that 


SS\ _ So J-a 
5 (=) =a-nu-», or slam |= 


which proves the unbiasedness asserted. Also, 








a (BS. C-D- SSg 
nat = 8 (=) = IJ | 
_G=DU=D 
a TJ i 


which shows that 0? is biased. A 
By means then of this lemma and relation (37), we reach the following 
conclusion: The hypothesis Ho, a is rejected at level a whenever 
SSA/U — 1) MS, 
Fyz = Fi10-D0-D:0: 38 
A SS./U =D =D 7 MS, > FT-1,0-1(J-b; (38) 
Next, consider the hypothesis of no column effects; i.e., 
Hog: Bi =--: = By = 0. (39) 


Then, working exactly as in (31) and (32), we obtain: 





fg=Y.=fh, ĝB =Y -Y =â, t=1,...,], (40) 


and 


o} = 5 NO ij — Yi Y. (41) 
i j 
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Thus, as in (35), the hypothesis Ho y is rejected whenever 


~ 


2 
dl (42) 
o2 


Set 
SSs=1)_ BF =I 5 - uy (43) 
j j 


and consider the following result. 


LEMMA 9 _ With SSe, o?, and SSg defined by (36), (41), and (43), it holds: 
IJo? = IJo? + SSg = SS, + SSp. 


PROOF Deferred to Subsection 14.3.3. 
By means of this lemma, relation (42) becomes, as in (37): Reject Ho, g whenever 
SSp 
SS. 
Finally, for the determination of the cutoff point C; in (44), a certain dis- 


tribution is needed. In other words, a lemma analogous to Lemma 7 is needed 
here. 





> Ci, where SS, and SSz are given in (36) and (43). (44) 


LEMMA 10 Consider the expressions SS, and SSg defined in (36) and (43), 
and replace the observed values Y;;, Yi, yj, and y. by the respective r.v.'s 
Y ij, Yi, Yj, and Y. , but retain the same notation. Then, under model (23): 


(i) The rv. SS,/0? is distributed as XD J- 
(ii) The statistics SS, and SSg are independent. 
Furthermore, if the null hypothesis Ho, g defined in (39) is true, then: 

(iii) The r.v. SSg/o? is distributed as x3_,. 

: siia SSB ID) 
(iv) The statistic 3/1) Fj g= DJ-1-* 
PROOF Deferred to Subsection 14.3.3. 
By means of this lemma and relation (44), we conclude that: The hypothesis 
Ho, is rejected at level æ whenever 


SSp/(J — 1) MS; 


F = = Fy 2 ore 45 
B SS. /U — DJ — 1) MS, > FJ-1,0-D( JD; (45) 





For computational purposes, we need the following result. 


LEMMA 11 Let SS, SSA, and SSg be given by (36) and (43), and let SS7 be 
defined by: 


i 


SSr =} } Wy- y Y. (46) 
J 
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Then: 

(i) SS, = J), y?—IJy?, SSp= E yi—1Jy?, SSp = 2 z, ya 1Jy?. 
‘ i Í (47) 

(ii) SSp = SS, + SSA + SSp. (48) 


PROOF Deferred to Subsection 14.3.3. 
Gathering together the hypotheses testing results obtained, we have the fol- 
lowing theorem. 





Under model (23), the hypotheses Ho 4 and Mo, g are rejected at level 
of significance œ whenever inequalities (38) and (45), respectively, hold 
true. The statistics SSA, SSg are computed by means of (47), and the 
statistic SS, is computed by means of (47) and (48). 











Asin Section 14.1, the various quantities employed in testing the hypotheses 
Ho,A and Hoz, and also for estimating the error variance o°, are gathered 
together in a table, an ANOVA table, as in Table 14.3. 


Table 14.3 Analysis of Variance for Two-Way Layout with One Observation per Cell 





Source of 
Variance 


Rows 


Columns 


Residual 


Total 








Degrees of 
Sums of Squares Freedom Mean Squares 
954 =J Ea HIE Ly I-1 MS, = 54 
1 
T 2 2 SSp 
ae EY) J—1 MS = 5% 
J y 
pm (Yy -— Yi -Yj +Y) Q-1)x(J-1) MSc = q 
IJ 
SSp = > X Yy- VP T= — 
i=1 j=1 








REMARK 5 In the present context, relation (48) is responsible for the term 
ANOVA. It states that the total variation (variance) ¿> ¡(Yi — Y)? (with 
reference to the grand sample mean Y_) is split in three ways: one component 
ij = Y Y associated with the row effects (due to the row effects, or 
explained by the row effects); one component >,» ¡(Y ¡= Y Y associated 
with the column effects (due to the column effects, or explained by the col- 
umn effects); and the residual component Y ¿> (Vij — Yi. — Yj + Y Y = 
iy Y) AA FO = E Y )1? (unexplained by the row and 
column effects, the sum of squares of errors). 

Before embarking on the proof of the lemmas stated earlier in this section, 
letusillustrate the theory developed by a couple of examples. In the first exam- 
ple, we are presented with a set of numbers, not associated with any specific 
experiment; in the second example, a real-life experiment is considered. 





L EXAMPIE6 


Table 14.4 


Data for a Two-Way 
Layout ANOVA 
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Apply the two-way layout ANOVA with one observation per cell for the data 
given in Table 14.4; take a = 0.05. 





1 2 3 4 Yi 
1 3 7 5 4 19/4 
3 j 2 0 2 3/4 
3 1 2 4 0 7/4 
Yj 1 113 3 2 y. = 3 





Here: û = y, = 3 ~ 2.417, and: 























. 19 29 7 A 29 17 
âi = y -4. =I B 3 =2333 f=Yy1-Y4.=1- 1 p~l 
da = Y2 — y. = = = as 1.667, ĝ = y2- y. = > = = ° = 1.25 
63 = Ya. — y. -1 = = a 0.667 Bs = y3—y.=3 = = K ~ 0.583 

Ba = ya- y. =2 5 = o ~ —0.417. 





1954? fay? db 29\? 104 
SSa (DO) x (55) 5 34.667, 


1? 29\? 147 
ss=3x 1+ (2) vata 2x (55) = = 12.25, 





12 12 


aa 29 Y 
SST = PAPAEHAN) 


707 
= — 2 58.917 
12 , 


so that 
707 104 147 


12 3 12 


L = 2. Furthermore, 


SS, = SSr — SSA — SSz = = 12. 





Hence, the unbiased estimate of o? is: Sees = 


MSA 104/3x2 26 








Paz = = = 8.667 
2 = MS. 12/6 3 
MS 147/12x3 147 
= 2 = — 22.042. 
FB = WS, 12/6 72 


Since Fi 110-D,(4-D;0 = F> 6;0.05 = 5.1433, we see that the hypothesis BoA is 
rejected, whereas the hypothesis Ho y is not rejected. 
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L EXAMPLE? | The cutting speeds of four types of tools are being compared by using five 


materials of varying degress of hardness. The data pertaining to measurements 
of cutting time in seconds are given in Table 14.5. Carry out the ANOVA for 
these data; take a = 0.05. 





Table 14.5 
A 1 2 3 4 5 Yi 
Data for a Two-Way i ie 9 8 1 7 6 
Layout ANOVA 9 20 14 17 12 17 
3 13 7 13 8 14 11 
4 11 5 10 3 6 7 
Yj 14 7 12 6 141 y =10 








Here I = 4, J = 5. From the table: £ = 10, and: 


â = Yy. — Y., = 6 — 10 = —4 Îi = y1- y. =14-10=4 
â = Y2, — y. = 16 — 10 = 6 bo = y2 — y. = 7 — 10 = -3 
âs =y. — y. =11-10=1°  ĝ =y- y. =12-10=2 
d= Y4. — Y. = 7 — 10 = -3 bs = ya — y. = 6 — 10 = —4 
Bs = ys — y. = 11 —- 10 = 1. 
SSA = 5 x (6? + 16? + 11? + 7°) — 20 x 10? = 310, 
SSg = 4 x (14? + 7 + 12? + 6? + 11%) — 20 x 10° = 184, 
SSr = 2,158 — 2,000 = 518, 
SSg = 518 — 310 — 184 = 24. 


SS, _ 24 _ 


MS, 310/3 155 


Hence, the unbiased estimate for 0? is: 














FE = = 2 ~ 51.667 
A MS. 24/12 3 
MSs 184/4 
Fg= = 2 = 
MS. 24/12 


Since Fi a-nau—bD;a = F3 12:05 = 3.4903, it follows that both hypotheses 
Ho,4 and Ho,z are to be rejected. So, the mean cutting times, either for the 
tools across the material cut, or for the material cut across the tools used, 
cannot be assumed to be equal (at the a = 0.05 level). Actually, this should 
not come as a surprise when looking at the margin of the table, which provide 
estimates of these times. 


= 14.3.3 Proof of Lemmas in Section 14.3 


In this subsection, a justification (or an outline thereof) is provided for the 
lemmas used in this section. 


14.3 Two-Way Layout with One Observation per Cell 421 


PROOF OF LEMMA 6 Consider the expression S(u, a, B) = >» ¡(Yij — 
u — ai — Bj) and recall that 90; = 0, > ¡Bj = 0. Following the method of 
Lagrange multipliers, consider the linear combination 


S*(u, a, B= >> (yy — u- 1 — BP +Y aiti) P; 
i Jj i j 
where 1, 12 are constants, determine the partial derivatives of S*(u, a, B) 
with respect to u, œi, and £;, equate them to 0, append to them the side con- 


straints >, a; = 0, yy Bj = 0, and solve the resulting system with respect to yu, 
the a;’s, and the £;’s (and also 11, 12). By implementing these steps, we get: 


ð 
ju ts B= —2) 1) (yy +n + J) [ai + 2 Y | By = 0 
i j i j 
ð 
25 Hs % B= -2) vy +2Ju + 2J: +2) |B; + =0 
J J 


sry, a, p)=—29 yy +24 +29 aj +21; +2 =0 , 
J i 








OB; 
Ya; = 0 
i 
2B=0 
j 
from which we obtain: 
sy amu a a 
U= IJ ; A y ol dl 1 2. há DH + ia E oT 
But 
Ty Thy Thy 
20 20 >= ln sy? Sothat A = 0, 


and likewise for 42 by summing up the £,’s. Thus, 


p=Y., Qi = Yi. — Y., t= 1, 05.54, Bi = Yj — Y, J=1,...,d. (49) 


Now the parameter ju is any real number, the a;’s span an (J — 1)-dimensional 
hyperplane, and the £,’s span a (J — 1)-dimensional hyperplace. It is then clear 
that the expression S(u, a, 3) (as a function of jz, the a;’s, and the £,’s) does 
not have a maximum. Then the values in (49) are candidates to produce a 
minimum of S(u, a, B), in which case (26) follows. Again, geometrical con- 
siderations suggest that they do produce a minimum, and we will leave it at 
that presently. A 


REMARK 6 It should be mentioned at this point that ANOVA models are 
special cases of the so-called General Linear Models, and then the above mini- 
mization problem is resolved in a general setting by means of linear algebra 
methodology. For a glimpse at it, one may consult Chapter 17 in the book 
A Course in Mathematical Statistics, 2nd edition, Academic Press (1997), 
by G. G. Roussas. 
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PROOF OF LEMMA 7 Here, we have to establish the relation: 
NO ij Y = IM — P+ Y Wy- yi- Y + YY. 
i j i i j 
Indeed, 
SSe = Y (Yij — yi — ys +u = Y ij — ya) (9. ud 
i j i j 


= 9 +1) 4 2) Zo — y. Wig — Ya) 
rT 7 
= IJo2 + SS4 —28Sp = I Jo? — S84, because 
3 dvi -= Y.Y — UI) = Liv -y.) Du -= yj) 
= Diu -— uu- Jy.) = 12 -yY = SS, A 


PROOF OF LEMMA 8 There are several ways one may attempt to justify 
the results in this lemma. One would be to refer to Lemma 2 and suggest 
that a similar approach be used, but that would do no justice. Another ap- 
proach would be to utilize the theory of quadratic forms, but that would 
require an extensive introduction to the subject and the statement and/or 
proof of a substantial number of related results. Finally, the last approach 
would be to use a geometric descriptive approach based on fundamental con- 
cepts of (finite dimensional) vector spaces. We have chosen to follow this last 
approach. 

All vectors to be used here are column vectors, and the prime notation, “/”, 
indicates transpose of a vector. Set Y= (Yi, ..., Yin Yor, ..., Yor; ...¡Yn,..., 
Yzy, so that Y belongs in an I x J-dimensional vector space to be denoted by 
Vīx J. Also, set 


n = EY = (EY, ..., EY. FYo, ..., EYos; ...¡EYpn,..., EY psy 
= (u +01 + fi, ---, U +1 + By u +a + pi., AHAH By; ...; 
wart pi,- U+ ar + Ba). 
Although the vector 7 has I x J coordinates, due to its form and the fact 
that > 0 = » ¿85 = 0, it follows that it lies in an (1 + J — 1)-dimensional 


space, Vr+.7-1. Finally, if Ho a: a1 = --- = ær = 0 holds, then the respective 
mean vector, to be denoted by n4, is 


na = (u + Bi, ---, A + BI a + Bi- EPR a Bi oa H BIY, 


and reasoning as above, we conclude that n4 € Vy. Thus, we have three vector 
spaces related as follows: Vy C Vr+J-1 C VīxJ. 

It is clear that, if yu, a;’s, and 6,’s are replaced by their (least squares) esti- 
mates (i, â;’s, and $ 7's, the resulting random vector í still lies in Vz, J—1, and like- 
wise for the random vector % 4, which we get, if y and £;’s are replaced by fi4 = 
fi and Bra = Êj; i.e., 74 € Vj. We now proceed as follows: Let az, ..., &]+J-1 
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be an orthonormal basis in Vy (i.e., aja; = 0 for i ¥ j and ||a;l| = 1), which 
we extend to an orthonormal basis Ol), +.) @y_-1, AT, -.., QTI- in Vi4s41) 
and then to an orthonormal basis 


Œl; ++.) AJ-1, A]J,..., AJ-1, AT4J) +++) AIxJ 


in V;,..7. This can be done, as has already been mentioned in a similar context 
in the proof of Lemma 5 in Chapter 13. Also, see Remark 4 in the same chapter. 
Since Y € V;x.,, it follows that Y is a linear combination of the a;’s with 
coefficient some r.v.'s Z;’s. That 2 Y= 2 Z¡Q0;. Since minimizes the 
quantity [Y — |? = >>; De (Yi; — u — ai — BjY, it follows that # is, actually, 
the projection of Y into the space coe 1. It follows then that 7 = pas TI Zai. 
Under Ho, a, the vector f , minimizes ||Y — n all? =>; » ¡Vi u- BÊ, and 
therefore is the projection of Y into the space Vz. Thus, a = y Zia. 
Then Y- = Dict Zi ai, ¥—fa = Di Fat De “reg Zi a, and 7] — 7.4 = 


¿PAE ee ;a;. Because of the orthonormality of the a;’s, it follows that: 












































IxJ IxJ , 
I¥-Al? =| Y Za = do Z, 
i=I+J i=I+J 
IxJ I-1 IxJ 
r-a = fi =3,4+)3,% 
i=I+J i=l i=I+J 
and 
I-1 
là- fall? = =>) Zi. 
i=l 
However, 
IY- Al? = SO Yu- A- 8 — BY = DOD Ou- Yi- Y ¡+Y Y = SSe, 
i j i j 
and 
la— fal? = 0 =D YY = JD O- Y? = Sa. 
i j i j i 
Therefore 
1-1 IxJ 
=> 2%. Ss} Zh (50) 
i=l i=I+d 
Now, observe that the r.v.'s Z¡,..., Zr, are the transformation of the r.v.'s 
Yi, ..., Yrxz under the orthogonal matrix P whose rows are the vectors a‘,’s. 
This follows immediately from the relation Y = ae Za, if we multiply (in 


the inner product sense) by a’, i We then get a ¡Y = ye a Zi ila -a;), and this is 


Zi, if j = i, and 0 otherwise. So, Z; = ai Y,i=1,...,1x J. Since the Y;’s are 
independent and Normally distributed with (ommon) variance o?, it follows 
that the Z;’s are also independently Normally distributed with specified means 
and the same variance o”. (See Theorem 8 in Chapter 6.) From the fact that 
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7 € Vi+5-1, it follows thatits last I x J-(1+1-1) = (I —1)(J— 1) coordinates 
are zero. Then so are the respective E'Z;. That is, Z; i = 1,...,1 x I are 
independent Normal, EZ; = 0 for the last (1 — 1)(J — 1) coordinates, and they 
all have variance o?. It follows that: 


- SS, _ 1 IxI Dy, a 
O %= 2 Dior Zi Xd-1y(J-1)" 


(ii) The statistics SS, and SS, are independent, because they are defined in 
terms of nonoverlapping sets of the independent r.v.’s Z;’s (see relation 
(50)). 

(iii) The expectations of the coordinates of 7) are u + a; + B;, and the expecta- 
tions of the coordinates of 7), are u + 6;. It follows that the expectations 
of the coordinates of  — 74 are (u + a; + bj) — (u + Bj) = 4. Therefore, 
if Ho A is true, these expectations are 0, and then so are the expectations 
of Z;,i=1,..., I — 1, since  — 94 = XL} Zia. It follows that 


I-1 
> = ALA ón Xi 
(iv) Immediate from parts (i)—(iii) and the definition of the F distribution. A 
PROOF OF LEMMA 9 We have to show that 
Y wu- =I) - 9. P+ > Uy- yi- yyt y. 
tj j tog 
As in the proof of Lemma 7, 
SS. = Y Y wy- Yi. ys +u =} Y ll - u) — Gy — WP 
i j i j 


= Vw - n+) Y ws- yu -29 5 — Y ij — y) 
i J i j i j 


= IJoÈ + SSg — 2SSg = IJo? — SSp, because 


Y Y uy — We - w= Y us- Y) Y Uy- u) 
j j i 


i 


= Y (y; - y ys 14) =1 (uj -y =SSp. A 
J j 


PROOF OF LEMMA 10 


G) Itis the same as (i) in Lemma 8. 

(ii) It is done as in Lemma 8(ii), where Ap, 4 is replaced by Ho, g. 
Gii) Again, it is a repetition of the arguments in Lemma 8(iii). 
(iv) Immediate from parts (i)-(iii). A 


PROOF OF LEMMA 11 


(i) They are all a direct application of the identity: Dy_¡(X — X)? = 
>, XA — nX?, 
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(ii) Clearly, 
SS, = >) Wy- yi- Yj + LY 
j 


i 





= Y Y uy- Y.) — (Yi — Y) — Us- y) 
i gj 
= Y owe - uP +I wm- Y DY Y 
-2 > > (Yi. — Y MYij — Y.) — 2 > > (yj — Y MYij — Y.) 


+29) (9. — Y NY — Y.) = SSr — SSa — SS, 
i j 


because 


YY ui- Yuy- Y) = Y ui- UY — Jy) 
j i 


a 


=J} W- Y =8Sa, 
Y Y u- uuy- Y) = Y us- udus- Ty) 
i j J 


= 1) (yj -yJ = SSz, 
j 
and 


Y Y m- Ws - Y) = w- yy. — Jy =0. A 
i j i 


REMARK 7 In a two-way layout of ANOVA, we may have K(>2) observa- 
tions per cell rather than one. The concepts remain the same, but the analysis 
is somewhat more complicated. The reader may wish to refer to Section 17.3 
in Chapter 17 of the book A Course in Mathematical Statistics, 2nd edition, 
Academic Press (1997), by G. G. Roussas. The more general cases, where there 
is an unequal number of observations per cell or there are more than two fac- 
tors influencing the outcome, are the subject matter of the ANOVA branch of 
statistics and are, usually, not discussed in an introductory course. 


3.1 Apply the two-way layout (with one observation per cell) analysis of vari- 
ance to the data given in the table below. Take a = 0.05. 











Levels of 
Levels of Factor B 1 2 3 4 5 
Factor A 
1 110 | 128 | 48 | 123 19 
2 214 | 183 | 115 | 114 | 129 


























3 208 | 183 | 130 | 225 | 114 
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3.2 Under the null hypothesis Hp, 4 stated in relation (30), show that the MLE's 


of u and £;, j =1,..., J are, indeed, given by the expressions in relation 
(33). 


3.3 Under the null hypothesis Hp, 4 stated in relation (30), show that the MLE 
of o? is, indeed, given by the expression in relation (34). 


3.4 In reference to the proof of Lemma 8, show that 7 is an (J + J — 1)- 
dimensional vector. 


Hint: This problem may be approached as follows. 


I J 


=. = 
Rh m. 
oo 
Sou 
oo 
or 
Rh © 
oo 
oo 


f J 
1 100 0000 0 
1 0 0 0100 0 
1 0 1 0 001 0 0 J 
X= : ; 
1 01 0 0 0 0 0 0 1 
1 0 0 0 01100. . .- 0 
1 0 0 0 01010. . .- 0 
. . . . . . . . . . . . . . . J 
1000... 01000. . 0 1 


Consider the IJ x (J + J + 1) matrix X’ given above and let the 1 x (J + 
J + 1) vector (' be defined by: 8’ = (u, a, ..., 7, Bi, ..., By). Then do 
the following: 


(i) Observe that y = X'8, so that 7 lies in the vector space generated by 
the columns (rows) of X’. 

(ii) For J > 2 and J > H, observe that rank X’ < I+ J +1 = min 
{I+ J+1, IJ}. 

(iii) Show that rank X’ = [+J—1by showing that: (a) The 1st column of X’ 
is the sum of the subsequent J columns of X’. (b) The 2nd column is the 
(sum of the last J columns) — (sum of the 3rd, 4th, ..., Ith columns). 
(c) The J + J — 1 columns, except for the first two, are linearly inde- 
pendent (by showing that any linear combination of them by scalars 
is the zero vector if and only if all scalars are zero). It will then follow 
that the dimension of 7 is J+ J — 1. 


3.5 In reference to the proof of Lemma 8, and under the hypothesis Ho, 4, show 
that the dimension of the vector na is J. 
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Hint: As in Exercise 3.4, one may use similar steps in showing that 
na belongs in a J- dimensional vector space and thus is of dimension J. 
To this end, consider the IJ x (J + 1) matrix X; given below, and let 
B4 = (u, fi, -.-, By). Then do the following: 


Ra 


Hp fe 


1 
1 


1 


100... - 0 
pee a 
0 0 0 0 1 
1 0 0 0 
Han 
0 0 0 0 1 
1 0 0 0 
e 
0 0 0 0 1 


(i) Observe that na = X464, so that na lies in the vector space generated 
by the columns (rows) of X}. 
(ii) For I > 2, it is always true that J + 1 < IJ, and therefore rank X} < 


J+1=minfJ +1, IJ}. 


(iii) Show that rank X, = J by showing that: (a) The lst column of X} 
is the sum of the subsequent J columns. (b) The J columns, except 
for the 1st one, are linearly independent. It will then follow that the 


dimension of nais J. 
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Some Topics 
in Nonparametric 
Inference 


In Chapters 9, 10, 11, and 12, we concerned ourselves with the question of 
point estimation, interval estimation, and testing hypotheses about (most of 
the time) a real-valued parameter 0. This inference was hedged on the basic 
premise that we were able to stipulate each time a probability model, which 
was completely known except for a parameter 6 (real-valued or of higher 
dimension). 

The natural question which arises is this: What do we do, if there is no sound 
basis for the stipulation of a probability model from which the observations 
are drawn? In such a situation, we don’t have parametric inference problems 
to worry about, because, simply, we don’t have a parametric model. In certain 
situations things may not be as bad as this, but they are nearly so. Namely, we 
are in a position to assume the existence of a parametric model which governs 
the observations. However, the number of parameters required to render the 
model meaningful is exceedingly large, and therefore inference about them is 
practically precluded. 

It is in situations like this, where the so-called nonparametric models and 
nonparametric inference enter the picture. Accordingly, a nonparametric ap- 
proach starts out with a bare minimum of assumptions, which certainly do not 
include the existence of a parametric model, and proceeds to derive inference 
for a multitude of important quantities. This chapter is devoted to discussing 
a handful of problems of this variety. 

Specifically, in the first section confidence intervals are constructed for the 
mean yu of a distribution, and also the value at x of the d.f. F, F(x). The confi- 
dence coefficients are approximately 1—«a for large n. Illustrative examples are 
also provided. In the following section confidence intervals are constructed 
for the quantiles of a d.f. F. Here the concept of a confidence coefficient is 
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replaced by that of the coverage probability. In the subsequent two sections, 
two populations are compared by means of the sign test, when the sample 
sizes are equal, and the rank sum test and the Wilcoxon—Mann-Whitney test 
in the general case. Some examples are also discussed. The last section con- 
sists of two subsections. One is devoted to estimating (nonparametrically) a 
p.d.f. and the formulation of a number of desirable properties of the proposed 
estimate. The other subsection addresses very briefly two very important prob- 
lems; namely, the problem of regression estimation under a fixed design, and 
the problem of prediction when the design is stochastic. 





| 15.1 Some Confidence Intervals with Given Approximate Confidence Coefficient 


We are in a position to construct a confidence interval for the (unknown) mean 
u of niid. observations Xj, ..., Xn with very little information as to where 
these observations are coming from. Specifically, all we have to know is that 
these r.v.’s have finite mean u and variance o? e (0, 00), and nothing else. Then, 
by the CLT, 





X= n 
a o Xi. (1) 


i=1 


= Z~ NOD, = 


Slr 


n> 


Suppose first that o is known. Then, for all sufficiently large n, the normal 
approximation in (1) yields: 


X= 
P| a 2 Da ]=1-0, 
2 a 2 











or 
p(2, — 2s ws Kyte) ata, 
vas n 
In other words, 
[2,25 Xn + ey — A (2) 
n Yn 
is a confidence interval for u with confidence coefficient approximately 1 — a 


(0 <a < 1). 
Now, if u is unknown, it is quite likely that o is also unknown. What we do 
then is to estimate o? by 


-1 bæ- a2 Da X= ¡EN (xy (3) 


and recall that (by Theorem 7(i) in Chapter 7): 


2 
P Shp P 
A o°, o = — |. (4) 
1 n= g2 n>0% 
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Then convergences (1) and (4), along with Theorem 6(iii) in Chapter 7, 
yield: 





iE SUE MUD BZ Ney 


S,/o Sn 
Then, proceeding as before, we obtain that 
- Sn = Sn 
Xn- ze —, X a — 5 
| nah, nta] (5) 


is a confidence interval for u with confidence coefficient approximately 1 — a. 
Here is an application of formula (5). 


Refer to the GPA's in Example 22 of Chapter 1, where we assume that the given 
GPA scores are observed values ofr.v.'s X;, i = 1,..., 34 with (unknown) mean 
u and (unknown) variance o?, both finite. Construct a confidence interval for 
u with confidence coefficient approximately 95%. 


DISCUSSION In the discussion of Example 1 in Chapter 13, we saw that: 
Y, xi = 100.73 and >>, x? = 304.7885, so that: 


100.73 304.7885 (100.73) 
——— =2. 2 = 

34 oe S 34 342 
Since 20.025 = 1.96, formula (5) gives: 


C= 





= 0.187, and Sn = 0.432. 


2 
2.9626 + 1. ~ [2.818, 3.108]. 
5.831 9626 + 06 x | [2.818, 3.108] 








2.9626 — 1.96 x 
Another instance where a nonparametric approach provides a confidence 
interval is the following. The iid. r.v.’s X1, ..., Xn have (unknown) d.f. F, and 


let F(x) be the empirical d.f. based on the X;’s, as defined in Application 5 to 
the WLLN in Chapter 7. We saw there that 


Fal) = 5 Yæ), YW), ..., Y,(x) independent r.v.s ~ BA, F(x)). 
i=l 


Then, by the CLT, 





VAF) — F(0)] Bs ga N(0, 1). (6) 
FOL- F] + 


Also, 


P FOD Faw P 
R RFO or FR l (7) 


From (6) and (7) and Theorem 6(iii) in Chapter 7, it follows that: 


MELO FOFO -FO _ vM -FO a y NO, D 
VFO — FANN FO -FO VRON — Fr@)y] "> dd 
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It follows that, for all sufficiently large n (depending on x), the following 
interval is a confidence interval for F(x) with confidence coefficient approxi- 
mately 1 — a; namely, 


co M (EDI E a +24 ee] (8) 











n 
As an application of formula (8), consider the following example. 


L EXAMPLE2 | Refer again to Example 22 in Chapter 1 (see also Example 1 in this chapter), and 


construct a confidence interval for F'(3) with approximately 95% confidence 
coefficient, where F is the d.f. of the r.v.'s describing the GPA scores. 


DISCUSSION In this example, n = 34 and the number of the observations 
which are < 3 are 18 (the following; 2.36, 2.36, 2.66, 2.68, 2.48, 2.46, 2.63, 2.44, 
2.13, 2.41, 2.55, 2.80, 2.79, 2.89, 2.91, 2.75, 2.73, and 3.00). Then 








1S 2 080, | FAO — FO) 


17 34 


Fu®) = F 


= 0.086, 
and therefore the required (observed) confidence interval is: 


[0.529 — 1.96 x 0.086, 0.529 + 1.96 x 0.086] ~ [0.360, 0.698]. 


REMARK 1 It should be pointed out that the confidence interval given by 
(8) is of limited usefulness, because the value of (the large enough) nfor which 
(8) holds depends on x. 





i 15.2 Confidence Intervals for Quantiles of a Distribution Function 


In the previous section, we constructed a confidence interval for the mean 
u of a distribution, whether its variance is known or not, with confidence 
coefficient approximately a prescribed number 1 — a (0 < œ < 1). Also, such 
an interval was constructed for each value F(x) of a d.f. F. Now, we have seen 
(in Section 3.4 of Chapter 3) that the median, and, more generally, the quantiles 
ofad.f. F are important quantities through which we gain information about F’. 
It would then be worth investigating the possibility of constructing confidence 
intervals for quantiles of F. To simplify matters, it will be assumed that F is 
continuous, and that for each p € (0, 1), there is a unique pth quantile x,; i.e., 
F(xp) = P(X < Xp) = p. The objective is to construct a confidence interval for 
Xp, and, in particular, for the median 250. This is done below in a rather neat 
manner, except that we don’t have much control on the confidence coefficient 
involved. Specifically, the following result is established. 
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Let Xj, ..., Xn be iid. r.v.'s with d.f. F, and let Yj, ..., Y, be the order 
statistics of the X;’s. For p e (0, 1), let x be the unique (by assump- 
tion) pth quantile of F. Then, for any 1 < i < j < n, the random interval 
[Y;, Y;] is a confidence interval for x, with confidence coefficient 


Ora- py. 











PROOF Define the r.v.’s W}, j = 1, ..., nas follows: 
TE 1 if X; <&p 
710 ff X;>a, j=l,...,n 


Then the rv.’s Wi, ..., Wn are independent and distributed as B(n, p), since 
P(W; = 1) = P(X; < xp) = F(%p) = p. Therefore 


n 
P(at least i of Xi, ..., Xn are < £p) = > (+) a- py. 


1 \k 
k=1 
However, P(at least i of Xy, ..., Xn are < xp) = P(Y; < Xp). Thus, 
n n . 
Mesa) (Ja — py". (9) 
k=i 


Next, for 1 < i < j < n, we, clearly, have; 
P(Y; < Xp) = P(Y; < Xp, Yj > Lp) + P(Y; < Xp, Yj < Xp) 
= P(Y; < Xp < Yj) + P(Y} < Xp) 


= P(Y; < Xp < Yj)+ P(Y; < 4p), (10) 
since P(Y; < Xp, Yj < £p) = P(Y; < Xp, Yj < Xp) = P(Y; < Xp) by the fact 


that (Y; < xp) S (Yi < £p). Then, relations (9) and (10) yield: 


PY, < Xp < Yj) = 5 (jua — py — P(Y; < &p) 


k=1 

=> (i) #a- p-> (o) Pa- p 
kzi k=j 

= (Ja (11) 
k=1 k 


So, the random interval [Y;, Y;] contains the point xp with probability yL (e) x 
p*(1— p)”*, as was to be seen. A 


REMARK 2 


(i) From relation (11), it is clear that, although p is fixed, we can enlarge the 
confidence coefficient »/2! (P) p*(1— p" by taking a smaller i and/or a larger 
j. The price we pay, however, is that of having a larger confidence interval. 
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(ii) By the fact that the confidence interval [Y;, Y;] does not have a prescribed 
confidence coefficient 1 — a, as is the case in the usual construction of confi- 
dence intervals, we often refer to the probability »/7! (d — p as the 
probability of coverage of xp by [Y;, Yj]. 


L EXAMPIE3 Consider the i.i.d. r.v.’s X1, ..., Xay with continuous d.f. F, which has unique 


£0.50, Xo.25, and xXo.75, and let Y;, ..., Ya9 be the corresponding order statistics. 
Then consider several confidence intervals for Xo.50, Xo.25, and xo.75, and calcu- 
late the respective coverage probabilities. 


DISCUSSION Using formula (11), we obtain the coverage probabilities 
listed in Table 15.1 for several confidence intervals for the median x% 50 and the 
first quartile xo 25. For the calculation of coverage probabilities for confidence 
intervals for the third quartile 2.75, we employ the following formula, which 
allows us to use the Binomial tables; namely, 


20-7 





+= 20 k 20—k 20 r 20—r 
y JOAO Y (0.25)"(0.75%—, 
k=i r=20—j+1 r 
Table 15.1 Quantile Confidence Interval Coverage Probability 
£0.50 (Yo, Y12) 0.3364 
Œs, Yis) 0.6167 
(¥7, Yia) 0.8107 
(Ye, Yis) 0.9216 
£0.25 (Ye, Ye) 0.2024 
(La, Y7) 0.5606 
(La, Ys) 0.8069 
(Y, Yo) 0.9348 
X0.75 (Yis; Yi7) 0.2024 
(Ya, Yis) 0.5606 
(Yis, Yio) 0.8069 
(Viz; Yao) 0.9348 








i 15.3 The Two-Sample Sign Test 


In this brief section, we discuss a technique of comparing two populations 
by means of the so-called sign test. The test requires that the two samples 
available are of the same size, and makes no direct use of the values observed; 
instead, what is really used is the relative size of the components in the pairs 
of r.v.'s. Some cases where such a test would be appropriate include those in 
which one is interested in comparing the effectiveness of two different drugs 
used for the treatment of the same disease, the efficiency of two manufacturing 
processes producing the same item, the response of n customers regarding 
their preferences toward a certain consumer item, etc. 


434 


THEOREM 2 


Chapter 15 Some Topics in Nonparametric Inference 


In more precise terms, let X4, ..., Xn be iid. r.v.’s with continuous d.f. F, 
and let Yi, ..., Y, be iid. r.v.'s with continuous d.f. G; the two samples are 
assumed to be independent. On the basis of the X;’s and Y;'s, we wish to test 
the null hypothesis Hp: F = G against any one of the alternatives Ha: F > G, 
H: F < G, Hi: F 4 G. The inequality F > G means that F(z) > G(2) for all 
z, and F(z) > G(2) for at least one z; likewise for F < G. To this end, set 


z=|) rx, > yo PPA <Y) i=1,...,m, pes (12) 


It is clear that the r.v.'s Z1, ..., Zn are independent and distributed as B(1, p), 
so that the r.v. Z is distributed as B(n, p). Under the hypothesis Ho, p = 1, 
whereas under Ha, H}, and H4, we have, respectively, p > 5, p< LD pF i. 
Thus, the problem of testing Hp becomes, equivalently, that of testing Ho: p = 
in the Bm, p) distribution. Formulating the relevant results, and drawing upo 
Application 1 in Section 11.3 of Chapter 11, we have the following theorem. 


1 
2 
n 





Consider the independent samples of the iid. r.v.'s Xı,..., Xn and 
Yı, ..., Y, with respective continuous d.f.'s F and G. Then, for testing 
the null hypothesis Hp: F = G against any one of the alternatives Ha: 
F > G, or Hy: F < G, or Hj: F 4 G, at level of significance a, the 
hypothesis Ah is rejected, respectively, whenever 


Ze, ow MEC, of LEC or ZC, (13) 
The cutoff points C, C’, and C1, C2 are determined by the relations: 
P(Z>C)+yP(Z=C)=a, or P(Z<C)-yP(Z=C)=1-a0, 
PZ <C’)+y’PZ=C') =a, 


PZ < C1) + yP(Z = C1) = % and P(Z > C2) + wP(Z = C2) = $, 
or 


P(Z < C1)+ yP(Z=Ci)= 5 and P(Z<C2)— wP(Z = C2) =1—5§, 
(14) 
and Z ~ Bin, 1/2) under Ab. 


For large values of n, the CLT applies and the cutoff points are given 
by the relations: 








y/n Vn 
CS a A oe 

an Sm (15) 
ME DE 











Refer to Example 25 in Chapter 1 regarding the plant height (in 1/8 inches) of 
cross-fertilized and self-fertilized plants. Denote by X;'s and Y;’s, respectively, 
the heights of cross-fertilized and self-fertilized plants. Then the observed val- 
ues forthe 15 pairs are given in Example 25 of Chapter 1, which are reproduced 
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in the Table 15.2 for convenience. At the level of significance a = 0.05, test the 
null hypothesis Ap: F = G, where F and G are the d.f.’s of the X;’s and Y;'s, 





respectively. 
Table 15.2 Pair Cross- Self- Pair Cross- Self- 

1 188 139 9 146 132 
2 96 163 10 173 144 
3 168 160 11 186 130 
4 176 160 12 168 144 
5 153 147 13 177 102 
6 172 149 14 184 124 
7 177 149 15 96 144 
8 163 122 











DISCUSSION From Table 15.2, we have: 
4=0, Za=1, Z3=0, Z4=0, Z5=0, Zg=0, Z¿=0, Z=0, 
Z9=0, Zi0=0, Z1:=0, Zi2=0, Zig=0, Zi4=0, Zg =l 


so that Z = 2. Suppose first that the alternative is Hj: F 4 G. Then His 
rejected in favor of H whenever Z < C¡ or Z > C2, where: 


P(Z < C) + yP(Z = Cy) = 0.025 and P(Z < C2) — yw P(Z = C2) = 0.975, 
and Z ~ B(15, 1/2). 


From the Binomial tables, we find Cı =4, C2=11, and y) = #4 ~ 0.178. 
Since Z=2<C¡(= 4), the null hypothesis is rejected. Next, test Ho against 
the alternative H}: p < 5 again at level a = 0.05. Then Ah is rejected in favor 
of H', whenever Z < C’, where C’ is determined by: 


P(Z<C)+yP(Z=C")=0.05, Z~ B(15, 1/2). 


From the Binomial tables, we find C’ = 4 and y’ = $} ~ 0.779. Since 
Z=2< C'(= 4), Fh is rejected in favor of H4, which is consistent with what 
the data say. 

For the Normal approximation, we get from (15): 20025 = 1.96, so that 
Cı = 3.703, C2 ~ 11.297, and Ah is rejected again, since Z = 2 < C¡(= 3.703). 
Also, 20.05 = 1.645, and hence C” ~ 4.314. Again, Ho is rejected in favor of H,, 
since Z = 2 < C'(~ 4.314). 


i 15.4 The Rank Sum and the Wilcoxon-Mann-Whitney Two-Sample Tests 


The purpose of this section is the same as that of the previous section; namely, 
the comparison of the d.f.'s of two independent samples of i.i.d. r.v.'s. However, 
the technique used in the last section may not apply here, as the two samples 
may be of different size, and therefore no pairwise comparison is possible 
(without discarding observations!) 
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So, what we have here is two independent samples consisting of the i.i.d. 
rv.’s X1, ..., Xm with continuous d.f. F, and the iid. r.v.'s Y¡,..., Y, with 
continuous d.f. G. The problem is that of testing the null hypothesis Ho: F = G 
against any one of the alternatives Ha: F > G, or Hy: F < G, or H F AG. 
The test statistic to be used makes no use of the actual values of the X;'s and 
the Y;’s, but rather of their ranks in the combined sample, which are defined 
as follows. Consider the combined sample of X;,..., Xm and Yj, ..., Yn, and 
order them in ascending order. Because of the assumption of continuity of 
F and G, we are going to have strict inequalities with probability one. Then 
the rank of X;, to be denoted by R(X;), is that integer among the integers 
1, 2, ..., m +n, which corresponds to the position of X;. The rank R(Y;) of Y; 
is defined similarly. Next, consider the rank sums Ry and Ry defined by: 





Rx =D RÆ), Ry = | RO). (16) 
i=l j=l 
Then 
Ry + Ry = Een), an 


because Ry + Ry = Ja RX) + Dj RE) = 14+2+---+m+n= 
cut mortn+) Before we go further, let us illustrate the concepts introduced 
so far by a numerical example. 


Let m = 5, n = 4, and suppose that: 
Xı=78, X2=65, X3=74, X4=45, X5=82, 
Y, =110, Ph =71, Y3=53, Y4=50. 
Combining the X;'s and the Y;'s and arranging them in ascending order, we get: 


45 50 53 65 71 74 78 82 110 
ADA) &) (X) wv on) 


Then: R(X) =7, R(X.)=4, R(X3)=6, R(X) =1, RCX5) =8, 
RY) =9, RW) =5, RW3)= 3, RY) = 2. 
It follows that: Ry = 26, Ry = 19, and, of course, Ry + Ry = 45 = 2al ; 


The m ranks (R(X1), ..., R(Xm)) can be placed in m positions out of m+n 
possible in (an | different ways (the remaining n positions will be taken up 
by the n ranks (R(%), ..., R(Y,)), and under the null hypothesis Ho, each 
one of them is equally likely to occur. So, each one of the ("7") positions of 
(R(X), ..., R(Xm)) has probability 1/("*"). The alternative H, stipulates that 
F > G;ie., F(z) > G(2) for all z, or PX < 2) > P(Y < 2) for all z, with the 
inequalities strict for at least one 2, where the r.v.'s X and Y are distributed 
as the X;'s and the Y;,’s, respectively. That is, under H4, the X;’s tend to be 
smaller than any z with higher probability than the Y;’s are smaller than any 
z. Consequently, since Ry + Ry is fixed = (m+ n)(m+ n + 1)/2, this suggests 
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that the rank sum Ry would tend to take small values. Therefore, Hj) should be 
rejected in favor of H4 whenever Rx < C. The rejection region is determined 
as follows: Consider the (""*") positions of the ranks (R(X1), ..., R(Xm)), and 
for each one of them, form the respective sum Rx. We start with the smallest 
value of Rx and proceed with the next smallest, etc, until we get to the kth 
smallest, where k is determined by: k/(""'") = a. (In the present setting, the 
level of significance a is taken to be an integer multiple of 1/(""*"), if we wish 
to have an exact level.) So, the rejection region consists of the k smallest values 
of Rx, where k/(""'") equals a. 


Likewise, the hypothesis Hp is rejected in favor of the alternative H}: F < G 
whenever Rx > C’, and the rejection region consists of the k largest values 
of Rx, where k/(""*") equals a. Also, Hp is rejected in favor of Hi: F 4 G 
whenever Rx < Cı or Rx > C2, and the rejection region consists of the smallest 
r values of Rx and the largest r values of Rx, where r satisfies the requirement 
r/("*") equals a/2. 


m 


Summarize these results in the following theorem. 





Consider the independent samples of the iid. r.v.'s X;,..., Xm and 
Yı, ..., Y, with respective continuous d.f.'s F and G. Then, for testing 
the null hypothesis Ap: F = G against any one of the alternatives Ha: 
F >G, or H: F <G, or H}: F + G, at level of significance a (so that a 
or 5 are integer multiples of 1/ Ca DD the respective rejection regions 
of the rank sum tests consist of: The k smallest values of the rank sum 
Rx, where k/(""*") = a; the k largest values of the rank sum Rx, where 
kis as above; the r smallest and the r largest values of the rank sum Rx, 


where r/("") = ¢. 








REMARK3 Intheory, carrying out the test procedures described in Theorem 
3 is straightforward and neat. Their practical implementation, however, is an- 
other matter. To illustrate the difficulties involved, consider Example 5, where 
m= 5andn = 4, so that (">") = (e) = 126. Thus, one would have to consider 
the 126 possible arrangements of the ranks (R(X¡), ..., R(X5)), form the re- 
spective rank sums, and see which ones of its values are to be included in the re- 
jection regions. Clearly, thisis not an easy task even for such small sample sizes. 

A special interesting case where the rank sum test is appropriate is that 
where the d.f. G of the Y;’s is assumed to be of the form: 


G(x) = F(x— A), x € N, for some unknown A e Ñ. 


In such a case, we say that G is a shift of F (to the right, if A > 0, and to 
the left, if A < 0). Then the hypothesis Hp: F = G is equivalent to testing 
A = 0, and the alternatives Hy: F > G, H}: F<G, H}: F 4 G are equivalent 
to: A>0, A<0, A#0. 
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Because of the difficulties associated with the implementation of the rank 
sum tests, there is an alternative closely related to it, for which a Normal 
approximation may be used. This is the Wilcoxon—Mann—Whitney two-sample 
test. For the construction of the relevant test statistic, consider all mn pairs 
(Xi, Yj), and among them, count those for which X; > Y;. The resulting r.v. 
is denoted by U and is the statistic to be employed. More formally, let the 
function u be defined by: 


1 ifz>0 
UNE le ifz<0, (18) 
Then, clearly, the statistic U may be written thus: 
m n 
U =>») u(X; — Y;). (19) 


i=l j=l 
The statistics U, Rx, and Ry are related as follows. 


LEMMA 1 Let Rx, Ry, and U be defined, respectively, by (16) and (19). 
Then: 
m(m+ 1) n(n+ 1) 
———— == mn + ————— = 
2 2 
PROOF Deferred to Subsection 15.4.1. 
On the basis of (20), Theorem 3 may be rephrased as follows in terms of 
the U statistics. A 


U=Ry= Ry. (20) 





In the notation of Theorem 3 and for testing the null hypothesis A against 
any one of the alternatives Ha, or H}, or H as described there, at level of 
significance a, the Wilcoxon—Mann-Whitney test rejects Ho, respectively: 


For U < C, where C is determined by P(U < C | Ho) =a; 
or U > C’, where C” is determined by P(U > C’| Ho) = a; 
or U < Ci or U > C2, where Cı and C2 are determined 

by PU < Ci o) = PU > C2| Ho) = 3- 


(21) 











In determining the cutoff points C, C’, Cı, and Cz above, we are faced 
with the same difficulty as that in the implementation of the rank sum tests. 
However, presently, there are two ways out of it. First, tables are available for 
small values of m and n (n < m < 10) (see page 341 in the book Handbook 
of Statistical Tables, Addison-Wesley (1962), by D. B. Owen), and second, for 
large values of mand n, and under A, 


U — EU 
—— ~Z~N(,1 22 
sd) (0, 1), (22) 
where, under Ho, 
1 
EU = ss Var(U) = ZUTEN Y (23) 
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Refer to Example 5 and test the hypothesis Ap: F = G against the alternatives 
Al: F AG and Hy: F >G. 


DISCUSSION In Example 5 we saw that Rx = 26 (and Ry = 19). Since 
m = 5 and n = 4, relation (20) gives: U = 11. From the tables cited above, 
we have: P(U < 2) = P(U > 17) = 0.082. So, Cı = 2, C2 = 17, and Ah is 
rejected in favor of H% at the level of significance 0.064. From the same tables, 
we have that P(U < 3) = 0.056, so that C = 3, and Ah is rejected in favor of 
H; at level of significance 0.056. The results stated in (23) are formulated as a 
lemma below. 


LEMMA 2 With U defined by (19), the relations in (23) hold true under Ab. 


PROOF Deferred to Subsection 15.4.1. 
For the statistic U, the CLT holds. Namely, 


LEMMA 3 With U, EU, and Var(U) defined, respectively, by (19) and (23), 
U-EU a 
— > 
s.d.(U) m,n—>00 


PROOF Itis omitted. By means of the result in (24), the cutoff points in (21) 
may be determined approximately, by means of the Normal tables. That is, we 
have the following corollary. 


Z ~ NỌ, 1). (24) 


COROLLARY (to Theorem 4 and Lemma 3) For large mand n, the cutoff 
points in (21) are given by the following approximate quantities: 


~ mn mn(m+n-+ 1) 1 mn mn(m+n-+ 1) 
Czy — ay 12 > Cx + a 12 , 
~ mn i mn(m+n+1) ~ mn i mn(m+n+1) 
Ci S F Zy 12 > Ca = F +2 12 : 


PROOF Follows immediately, from (21) and (24). A 








(25) 














Refer to Example 25 in Chapter 1 (see also Example 4 here), and test the null 
hypothesis Hp: F = G at level of significance «œ = 0.05 by using Theorem 4 and 
the above corollary. 


DISCUSSION Here m= n= 15, 20.05 = 1.645, and 20.025— 1.96. Then: 
Cı = 112.50 — 1.96 x 24.105 = 112.50 — 47.2458 ~ 65.254, 
C2 ~ 112.50 + 47.2458 ~ 159.746, 
C ~ 112.50 — 1.645 x 24.105 ~ 112.50 — 39.653 = 72.847, 
C” ~ 112.50 + 39.653 = 152.153. 


Next, comparing all 15 x 15 pairs in Table 15.3, we get the observed value of 
U = 185. 
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Table 15.3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 


X 188 96 168 176 153 172 177 163 146 173 186 168 177 184 96 
Y 139 163 160 160 147 149 149 122 132 144 130 144 102 124 144 





Therefore the hypothesis Hp is rejected in favor of Hi, since U = 185 > 
159.746 = C; the null hypothesis is also rejected in favor of H}, since U = 
185 > 152.153 = C’; but the null hypothesis is not rejected when the alternative 
is Ha, because U = 185 4 72.847 = C. 


L.) 15.4.1 Proofs of Lemmas 1 and 2 


PROOF OF LEMMA 1 Let Xo, ..., Xam be the order statistics of the r.v.’s 
X1,..., Xm, and look at the rank R(X()) in the combined sample of the X;’s 
and the Y;’s. For each R(X(;)), there are R(X) — 1 X;’s and Y;’s preceding Xo). 
Of these, 7 — 1 are X;'s and hence R(X) — 1 — (i — 1) = R(X) — i are Y;’s. 
Therefore 
U = [R(Xw) - 1] +--+ [R(Xem) — m] 
= [R(Xw) + +R(Xm)] -A+ +m) 

m(m-+ 1) m(m+ 1) 

2 = Ry 2 ) 


since R(X(1)) + --- + R(X(m) is simply a rearrangement of the terms in the 
rank sum R(X,) + --- + R(Xm) = Rx. Next, from the result just obtained and 
(17), we have: 


= [RX + +--+ RXn)] 











y= Minmin+tD_ p, _ MMED 
2 2 
i= 1 1 
oe ie) Ry = mnt "FD p, A 


PROOF OF LEMMA 2 Recall that all derivations below are carried out under 
the assumption that Hy(F" = G) holds. Next, for any ¿and j: 


1 4 1 
Eu(X;— Y) =1x PX: >Y)=3, PË -Y= P x PA > Y¡)= 3) 
so that 


Ra] 


Var(u(X; — Y;)) = ; 


Therefore EU = 5, 5-15 = 3, and 


m m n 


Dd Vr -Y+ 9, 


i=l j=l i=1 j=l k=1 l= 


3 


Var(U) 


Cov(u(X; — Y;), uw Xt — Y) 
1 


= oe + sum of the covariances on the right-hand side above. (26) 
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Regarding the covariances, we consider the following cases. First, let i 4 k 
and j ¢ l. Then Cov(u(X; — Y;), U(Xr — Y,)) = 0 by independence. Thus, it 
suffices to restrict ourselves to pairs (X;, Y;) and (Xy, Yı) for which i = k and 
j #1, andi # kand j = l. In order to see how many such pairs there are, 
consider the following array: 


(AX, Y), (X, Y), ...) (X, Yn) 
(X2, Y1), (X2, Ya), ..., (X2, Yn) 


(Xn, Y), (Xn, Y), ...3 (Xn, Yn). 


From each one of the m rows, we obtain (5) x 2 = n(n — 1) terms of the form: 
Cov(u(X — Y), U(X — Z)), where X, Y, Z are independent r.v.'s with d.f. F = G. 
Since Cov(u(X — Y), uX — Z)) = P(X > Y and X > Z)-— L, we have then 
n(n — 1)P(X > Y and X > Z)- "42 as a contribution to the sum of the 
covariances from each row, and therefore from the m rows, the contribution 
to the sum of the covariances is: 

mn(n — 1) 
a 
Next, from each one of the n columns, we obtain (3) x 2 = m(m-— 1) terms of 
the form: Cov(u(X — Z), U(Y — Z)) = P(X > Zand Y > Z)-— i Therefore the 
contribution from the n columns to the sum of the covariances is: 
mn(m— 1) 
-a 


mn(n— DP(X > Y and X > Z) — (27) 


mn(m—1)P(X > Zand Y > Z)-— (28) 
Now, 
(X>YandX>2Z)=(X>Y Y >2Z%,X>2Z)]U(X>Y Y<Z,X>2Z) 


=(X >Y, Y> ZU(X >Z, Ze Vj] Te Dues Ze), 


since 

(X>Y Y >2Z)Cc(X>2), and (X>2Z,Z>Y)C(X>Y) 
Thus, 

PX>Y and X>2Z%)=P(X>Y>2Z)+4P(X > Z>Y). (29) 
Likewise, 


(X>Z and Y>Z)=(X>Y Y >Z,X>2Z)U(X<Y Y >2Z,X>2Z) 
=(X>Y, Y> JU >X, X>2Z)=(X>Y>ZU(Y >X>2), 
since 
(X>Y Y>2)c(X> Z), and (Y>X,X>2Z)C(Y >Z). 
Thus, 
P(X > ZandY > Z) = P(X > Y > Z)+ P(Y > X > Z). (30) 
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The r.v.'s X, Y, and Z satisfy, with probability one, exactly one of the inequali- 
ties: 

X>Y>Z, X>2Z>Y, Y>X>2Z, 

Y>Z>X, Z>X>Y, Z>Y>X, 
and each one of these inequalities has probability 1/6. Then, the expressions 
in (27) and (28) become, by means of (29) and (30), respectively: 


mn(n—1) man-1) matn-1) mn(m—1) mnxm-1) mn(m- 1) 


3 4 12 i 3 4 12 
Then, formula (26) yields: 





mn k mn(n— 1) dl mn(m—-1) — mn(m+n-+ 1) 


Var(U) = 
an) 12 12 12 


i 15.5 Nonparametric Curve Estimation 


For quite a few years now, work on nonparametric methodology has switched 
decisively in what is referred to as nonparametric curve estimation. Such esti- 
mation includes estimation of d.f.’s, of p.d.f.’s or functions thereof, regression 
functions, etc. The empirical d.f. is a case of nonparametric estimation of a 
d.f., although there are others as well. In this section, we are going to de- 
scribe briefly a way of estimating nonparametrically a p.d.f., and record some 
of the (asymptotic) desirable properties of the proposed estimate. Also, the 
problem of estimating, again nonparametrically, a regression function will be 
discussed very briefly. There is already a huge statistical literature in this area, 
and research is currently very active. 


A 








L. 15.5.1 Nonparametric Estimation of a Probability Density Function 


The problem we are faced with here is the following: We are given n i.i.d. r.v.'s 
X1,..., Xn with p.d.f. f of the continuous type, for which very little is known, 
and we are asked to construct a nonparametric estimate f,,(w) of f(x), for each 
x € R, based on the random sample X;,..., Xn. The approach to be used here 
is the so-called kernel-estimation approach. According to this method, we 
select a (known) p.d.f. to be denoted by K and to be termed a kernel, subject 
to some rather minor requirements. Also, we choose a sequence of positive 
numbers, denoted by {n}, which has the property that hn > 0 as n —> oo 
and also satisfies some additional requirements. The numbers hy, n > 1, are 
referred to as bandwidth for a reason to be seen below. Then, on the basis of 
the random sample X;,..., Xn, the kernel K, and the bandwidths hn, n > 1, 
the proposed estimate of f(x) is h(x) given by: 


A j 2 =X 
fos: x(* | (31) 


My = hn 





| EXAMPLE 8 
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Construct the kernel estimate of f(x), for each x e KR, by using the U(—1, 1) 
kernel; i.e., by taking 
1 
K(x) = > for —1<x<1, and 0, otherwise. 


DISCUSSION Here, it is convenient to use the indicator notation, namely, 
K(x) = I_1 1¡(x%) (where, it is recalled, [4(v) = 1 if x e A, and 0 if x e A’). 
Then the estimate (31) becomes as follows: 





A 1 2 x— Xi 
(x) = — NA- i R. 2 
O) akil a ) xve (32) 


So, 11058) = 1, if and only if x — hn < Xi < x + hy; in other words, 
in forming f(x), we use only those observations X; which lie in the window 
[x — hn, + hy]. The breadth of this window is, clearly, determined by hn, and 
this is the reason that h,, is referred to as the bandwidth. 

Usually, the minimum of assumptions required of the kernel K and the 
bandwidth h,, in order for us to be able to establish some desirable properties 


of the estimate f(x) given in (31), are the following: 





K is bounded; i.e., sup {K(x); x € N} < co. 
xK(x) tends to 0 as x > +00; i.e., |wK(x)| — 0. (33) 
|x| oo 





K is symmetric about 0; i.e., K(—x) = K(x), we R. 


Asn>00:() (0 <)n— 0 
(ii) Nn > 00 (34) 
(iii) nh? — oo. 


REMARK 4 Observe that requirements (33) are met for the kernel used 
in (32). Furthermore, the convergences in (34) are satisfied if one takes, e.g., 
hn = n with 0 < a < 1/2. Below, we record three (asymptotic) results 
regarding the estimate f(x) given in (31). 





Under assumptions (33) and (34)(i), the estimate f,(x) given in (31) is an 
asymptotically unbiased estimate of f(x) for every x e R at which f is 
continuous; i.e., 


ERD > f(x) asn— oo. 








Under assumptions (33) and (34)(i), (ii), the estimate f,,(a) given in (31) 
is a consistent in quadratic mean estimate of f(x) for every x e h at 
which f is continuous; i.e., 


EL@)—Ff@ > 0 asn= 00. 
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Under assumptions (33) and (34)(i)—(iii), the estimate f,(w) given in (31) 
is asymptotically normal, when properly normalized, for every x e Vi at 
which f is continuous; i.e., 


PO- EAO A Z NOD. 


s.d.(f,(x)) 2 











We have no intention of even attempting to prove any of the theorems just 
stated. Their proofs can be found in the second reference given below. In 
closing this section, it is only fitting to mention that the concept of kernel 
estimation of a p.d.f. was introduced by Murray Rosenblatt in 1956, and it 
was popularized by a fundamental paper by E. Parzen in 1962. The relevant 
references are as follows: 

“Remarks on some nonparametric estimates of a density function” by 
M. Rosenblatt in the Annals of Mathematical Statistics, Vol. 27 (1956), pages 
823-835. “On estimation of a probability density function and mode” by 
E. Parzen in the Annals of Mathematical Statistics, Vol. 33 (1962), pages 1065- 
1076. 


L. 15.5.2 Nonparametric Regression Estimation 


In Chapter 13, a simple linear regression model was studied and its usefulness 
was demonstrated by means of specific examples. It was also stated that there 
is a definite need for more general regression models, where the linearity is 
retained, or it is discarded altogether. This issue is addressed, to a considerable 
extent, in this section. 

Specifically, the model considered here is the following: For each n = 
1,2,..., consider points X 1, ..., Lnn in R, and at each one of them, an obser- 
vation is taken, to be denoted by Ym, i = 1,...,n. It is assumed that Yp; is 
equal to some unknown function g evaluated at £n; except for an error e,,;; i.e., 


Yni = 9(Xni) + eni, i = 1, e) n. (85) 


On the errors en, i = 1,..., n, we make the usual assumptions that they are 
iid. r.v.'s with Eem = 0 and Var(eni) = 0? < oo. 

The model in (1) of Chapter 13 is a very special case of the model just de- 
scribed. In the first place, the points where observations are taken are allowed 
here to depend on n, and second, the regression function in (1) of Chapter 
13 is taken from here by setting g(x) = bı + f2x, so that bı + B2 = g(x), 
1=1,...,n. The function g in (35) is subject only to the requirement that it is 
defined on a bounded subset S of R and that it is continuous. 

The objective here is to (nonparametrically) estimate the function g(x), for 
each x e S, by means of the observations Y,1, ..., Ynn. The proposed estimate 
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is the statistic g,,(v; Xn) defined as follows: 


n 
Onl; Xn) = Y Wil; XT, (36) 

i=l 
where Xx» = (%n1, ---, Urn) and w, are Weights, properly chosen, which depend 
on the particular point x in S and also the points X,1, ..., Lnn, Where obser- 


vations are taken. The weights are required to satisfy certain conditions, and 
there is considerable flexibility in choosing them. We do not intend to enter 
here into this kind of detail. Instead, we restrict ourselves to stating three basic 
properties that the estimate defined in (36) satisfies. 





Under suitable regularity conditions, the estimate g,(7; Xn) defined in 
(36) is an asymptotically unbiased estimate of g(x); i.e., 


Egn(X; Xn) = g(x), for every x e S. (37) 








Under suitable regularity conditions, the estimate 9, (x; x,)is a consistent 
in quadratic mean estimate of g(x); i.e., 


Ela Xn) — gD]? — 0, for every x € sS. (38) 


N—>00 








Under suitable regularity conditions, the estimate g(x; Xn), properly 
normalized, is asymptotically Normal; i.e., 
YX; Xn) — Egan; Xn) a 


— Z~N(0,1), foreveryve sS. 
s.d.(9n(%; Xn)) Mare 





Also, 


E ine ton even cS: (39) 
$.d.(Qn(@; %n)) PAS 








Convergences (37) and (38) provide asymptotic optimal properties for the 
estimate proposed in (36). If it happens that the error variance o? is known, 
or an estimate of it is available, then convergence (39) provides a way of 
constructing confidence interval for g(x) with confidence coefficient approx- 
imately equal to 1 — a for large n. (See also Exercise 5.2.) 

In the regression model considered in Chapter 13, one of the basic tenets 
was that the point x at which an observation Y is to be made can be chosen, 
more or less, at will. This, however, need not always be the case. Instead, it 
may happen that the point x itself is the observed value of a r.v. X. Thus, the 
setup here is as follows: A r.v. X is observed, and if X = x, then an observation 
is taken at the point x. 
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In this framework, several questions may be posed. One of the most impor- 
tant is this: Given that X = x, construct a predictor of Y corresponding to x. 
The proposed predictor is the conditional expectation of Y, given X = x, call 
it M(x); i.e., 

ma“) =E |X =x). (40) 


The quantity defined in (40) is an unknown function of x, since the conditional 
p.d.f. Y, given X = x, is unknown. The problem which then arises is that 
of estimating m(x). The discussion in the remainder of this section revolves 
around this question. 

Clearly, the estimation of m(x) must be made on the basis of available data. 
To this effect, we assume that we have at our disposal n pairs of r.v.'s (X;, Y;), 
i = 1,..., which are independent and distributed as the pair (X, Y). Then 
the proposed estimate of m(x), call it (x), is the following: 











A Ün (L) im 1 2 T= Xi 
n(x) = Fa)’ where (2) = a 2, Y,K = ) (41) 


and 








2 a 1 n _ Xi 
Ja is given in (3D; Le» ja) = zz 2 x(* = ) 


The estimated predictor Mm, (x) has several asymptotic optimal properties of 
which we single out only one here; namely, asymptotic normality. 





Let 7,,(x) be the estimate of the predictor m(x) given by (41) and (40), 
respectively, and let o7(x) be defined by: 

oy (a) 
S) 
where a (x) is the conditional variance of Y, given X = x. Then the esti- 


mated predictor M,(x), properly normalized, is asymptotically Normal; 
es 


ox) = 





/ i K’()dt (for f(x) > 0), (42) 


Vahala) — M] NO, 67a). (43) 


N—>00 











The variance 0*(x) of the limiting normal distribution is unknown, but an 
estimate of it may be constructed. Then the convergence (43) may be used to 
set up a confidence interval for m(x) with confidence coefficient approximately 
equal to 1 — « for large n. (See also Exercise 5.3.) 

In closing this section, it should be mentioned that its purpose has been 
not to list detailed assumptions and present proofs (many of which are beyond 
the assumed level of this book anyway), but rather to point out that there are 
regression results available in the literature, way beyond the simple linear 
model studied in Chapter 13. Finally, let us mention a piece a terminology 
used in the literature, namely, the regression model defined by (35) is referred 
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to as a fixed design regression model, whereas the one defined by (40) is called 
astochastic design regression model. The reasons for this are obvious. In the 
former case, the points where observations are taken are fixed, whereas in the 
latter case they are values of a r.v. X. 


5.1 Note: All convergences in this exercise hold for continuity points x of 
JS (x). In Theorem 7.1, it is stated that 


Ín(x) — Enix) NOD. 
sd (fia) 22 


By this fact, Theorem 5, and some additional assumptions, it is also shown 
that 





O- f(a) a 
A _> 
Sa) "> 
(i) Use expression (31) in order to show that 


N(0, 1). (44) 





A 1 a= xX, 
Var( fala) = me Yer K h , so that 
n n 





(nln) Var( f(a) = var{x (* z x )) 


(ii) Use the formula Var(X) = EX? — (EX)?, and the transformation 
me = u with —oo < u < oo, in order to show that 


(nhn)Var( fla) = | p (u) f(x — hru) du 


= hal i Feu Foetal an) 
Now, it can be shown that B 
/ . Kw fe haddu z SO f Kadu = f), 
and 
f Ñ Ku) f(a —hyudu —> f(a) Í i K”) du. 


From these results, assumption (34)(), and part (ii), it follows then 
that 


o2(a) E (ahn Var AN — S) / Pou rw. 


n—> 


(45) 
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(iii) From convergence (45) and Theorem 7, conclude (by means of the 
Corollary to Theorem 5 in Chapter 7) that 


R- ER) “> 0, and hence fy(a) — EA) > 0. (46) 


(iv) Use convergence (46) and Theorem 5 in order to conclude that f(x) is 
a consistent estimate of f(a) (in the probability sense); i.e., f(x) > 








S). 
Set 
62) = faa) f "Y (u) du. (47) 
(v) Use relations (45) and (47) to conclude that 
6? (£) p Ón(X) p 
ox) n>0o0 >? One n= ` (48) 


Since, by (44) and (45), 
faa) = Is% _ Jahn ful) — SW) 
SAn) nh, Var( F) 
ALSO) a 


On(X) 


it follows from this and (48) (by means of Theorem 6 in Chapter 7) that 
Vint fr) — SONO LÍA FL a 








> NO, D, 

















ZOZO ~ Sna) a NO, D. 
n n n (49) 
(vi) Use convergence (49) in order to conclude that, for all sufficiently 
large n, 
Ón(x) n(x) | 
x Zaja < JU) < Jal) + Za/2 | ~l—-a; 
P| f )— a p< f < fax) a /2 
i.e., the interval [f,(xv) — 24, [25 F(x) + S zaa] is a confidence 


interval for f(x) with confidence coefficient approximately 1 — a, for 
all sufficiently large n. 


5.2 Refer to convergence (39), and set on(£) = 8.d.(9n(%; %)). Use relation (39) 
in order to conclude that, for all sufficiently, large n, 


Plgnkx; Xn) SS Zu /20n(X) = g(a) = In; Xn) + 2u/20n(H)] ~l-a. (50) 


Thus, if o„(x) is known, then expression (50) states that the interval 
[On(X; Xn)—Za/2On(X), IL; Ln)+2a/2On(x)] is a confidence interval for g(x) 
with confidence coefficient approximately 1 — a, for all sufficiently large 
n. If 0, (x) is not known, but a suitable estimate of it, 6,(xv), can be con- 
structed, then the interval [9n (8; Ln) — 20/26n(%), INE; Ln) + Za/2Gn(x)] is 
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a confidence interval for g(x) with confidence coefficient approximately 
1 — g, for all sufficiently large n. One arrives at this conclusion working as 
in Exercise 5.1. 


5.3 Refer to convergence (43), and go through the usual manipulations to 
conclude that, for all sufficiently large n, 








Ping) = m2 <m(x) < Mala) + | ~l-a (1) 


Thus, if o (x)is known, then expression (51) states that the interval [M,(1)— 
SD /2) ML) + a /2] is a confidence interval for m(x) with confi- 

ence coefficient approximately 1 — a, for all sufficiently large n. If o (x) is 
not known, but a suitable estimate of it, 6, (w), can be constructed, then the 
interval [Mn(x) — SO a, J2, Mn (xX) + D ay /2] is a confidence interval for 
m(x) with confidence coefficient approximately 1 — «, for all sufficiently 


large n. 
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Tables 





The tabulated quantity is 
k 
> (5) pia pyr. 
pa 
P 
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 
2 0 0.8789 0.7656 0.6602 0.5625 0.4727 0.3906 0.3164 0.2500 
1 0.9961 0.9844 0.9648 0.9375 0.9023 0.8594 0.8086 0.7500 
2 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
3 0 0.8240 0.6699 0.5364 0.4219 0.3250 0.2441 0.1780 0.1250 
1 0.9888 0.9570 0.9077 0.8437 0.7681 0.6836 0.5933 0.5000 
2 0.9998 0.9980 0.9934 0.9844 0.9695 0.9473 0.9163 0.8750 
3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
4 0 0.7725 0.5862 0.4358 0.3164 0.2234 0.1526 0.1001 0.0625 
1 0.9785 0.9211 0.8381 0.7383 0.6296 0.5188 0.4116 0.3125 
2 0.9991 0.9929 0.9773 0.9492 0.9065 0.8484 0.7749 0.6875 
3 1.0000 0.9998 0.9988 0.9961 0.9905 0.9802 0.9634 0.9375 
4 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
5 0 0.7242 0.5129 0.3541 0.2373 0.1536 0.0954 0.0563 0.0312 
1 0.9656 0.8793 0.7627 0.6328 0.5027 0.3815 0.2753 0.1875 
2 0.9978 0.9839 0.9512 0.8965 0.8200 0.7248 0.6160 0.5000 
3 0.9999 0.9989 0.9947 0.9844 0.9642 0.9308 0.8809 0.8125 
4 1.0000 1.0000 0.9998 0.9990 0.9970 0.9926 0.9840 0.9687 
5 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
6 0 0.6789 0.4488 0.2877 0.1780 0.1056 0.0596 0.0317 0.0156 
1 0.9505 0.8335 0.6861 0.5339 0.3936 0.2742 0.1795 0.1094 
2 0.9958 0.9709 0.9159 0.8306 0.7208 0.5960 0.4669 0.3437 
3 0.9998 0.9970 0.9866 0.9624 0.9192 0.8535 0.7650 0.6562 
4 1.0000 0.9998 0.9988 0.9954 0.9868 0.9694 0.9389 0.8906 
5 1.0000 1.0000 1.0000 0.9998 0.9991 0.9972 0.9930 0.9844 
6 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
7 0 0.6365 0.3927 0.2338 0.1335 0.0726 0.0373 0.0178 0.0078 
1 0.9335 0.7854 0.6114 0.4449 0.3036 0.1937 0.1148 0.0625 
2 0.9929 0.9537 0.8728 0.7564 0.6186 0.4753 0.3412 0.2266 
3 0.9995 0.9938 0.9733 0.9294 0.8572 0.7570 0.6346 0.5000 
4 1.0000 0.9995 0.9965 0.9871 0.9656 0.9260 0.8628 0.7734 
5 1.0000 1.0000 0.9997 0.9987 0.9952 0.9868 0.9693 0.9375 
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n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 

7 6 1.0000 1.0000 1.0000 0.9999 0.9997 0.9990 0.9969 0.9922 

7 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 

8 0 0.5967 0.3436 0.1899 0.1001 0.0499 0.0233 0.0100 0.0039 

1 0.9150 0.7363 0.5406 0.3671 0.2314 0.1350 0.0724 0.0352 

2 0.9892 0.9327 0.8238 0.6785 0.5201 0.3697 0.2422 0.1445 

3 0.9991 0.9888 0.9545 0.8862 0.7826 0.6514 0.5062 0.3633 

4 1.0000 0.9988 0.9922 0.9727 0.9318 0.8626 0.7630 0.6367 

5 1.0000 0.9999 0.9991 0.9958 0.9860 0.9640 0.9227 0.8555 

6 1.0000 1.0000 0.9999 0.9996 0.9983 0.9944 0.9849 0.9648 

7 1.0000 1.0000 1.0000 1.0000 0.9999 0.9996 0.9987 0.9961 

8 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 

9 0 0.5594 0.3007 0.1543 0.0751 0.0343 0.0146 0.0056 0.0020 

1 0.8951 0.6872 0.4748 0.3003 0.1747 0.0931 0.0451 0.0195 

2 0.9846 0.9081 0.7707 0.6007 0.4299 0.2817 0.1679 0.0898 

3 0.9985 0.9817 0.9300 0.8343 0.7006 0.5458 0.3907 0.2539 

4 0.9999 0.9975 0.9851 0.9511 0.8851 0.7834 0.6506 0.5000 

5 1.0000 0.9998 0.9978 0.9900 0.9690 0.9260 0.8528 0.7461 

6 1.0000 1.0000 0.9998 0.9987 0.9945 0.9830 0.9577 0.9102 

7 1.0000 1.0000 1.0000 0.9999 0.9994 0.9977 0.9926 0.9805 

8 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9994 0.9980 

9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 

10 0 0.5245 0.2631 0.1254 0.0563 0.0236 0.0091 0.0032 0.0010 

1 0.8741 0.6389 0.4147 0.2440 0.1308 0.0637 0.0278 0.0107 

2 0.9790 0.8805 0.7152 0.5256 0.3501 0.2110 0.1142 0.0547 

3 0.9976 0.9725 0.9001 0.7759 0.6160 0.4467 0.2932 0.1719 

4 0.9998 0.9955 0.9748 0.9219 0.8275 0.6943 0.5369 0.3770 

5 1.0000 0.9995 0.9955 0.9803 0.9428 0.8725 0.7644 0.6230 

6 1.0000 1.0000 0.9994 0.9965 0.9865 0.9616 0.9118 0.8281 

7 1.0000 1.0000 1.0000 0.9996 0.9979 0.9922 0.9773 0.9453 

8 1.0000 1.0000 1.0000 1.0000 0.9998 0.9990 0.9964 0.9893 

9 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9997 0.9990 

10 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 

11 0 0.4917 0.2302 0.1019 0.0422 0.0162 0.0057 0.0018 0.0005 

1 0.8522 0.5919 0.3605 0.1971 0.0973 0.0432 0.0170 0.0059 

2 0.9724 0.8503 0.6589 0.4552 0.2816 0.1558 0.0764 0.0327 

3 0.9965 0.9610 0.8654 0.7133 0.5329 0.3583 0.2149 0.1133 

4 0.9997 0.9927 0.9608 0.8854 0.7614 0.6014 0.4303 0.2744 

5 1.0000 0.9990 0.9916 0.9657 0.9068 0.8057 0.6649 0.5000 

6 1.0000 0.9999 0.9987 0.9924 0.9729 0.9282 0.8473 0.7256 

7 1.0000 1.0000 0.9999 0.9988 0.9943 0.9807 0.9487 0.8867 

8 1.0000 1.0000 1.0000 0.9999 0.9992 0.9965 0.9881 0.9673 

9 1.0000 1.0000 1.0000 1.0000 0.9999 0.9996 0.9983 0.9941 

10 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9995 

11 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 

12 0 0.4610 0.2014 0.0828 0.0317 0.0111 0.0036 0.0010 0.0002 

1 0.8297 0.5467 0.3120 0.1584 0.0720 0.0291 0.0104 0.0032 

2 0.9649 0.8180 0.6029 0.3907 0.2240 0.1135 0.0504 0.0193 

3 0.9950 0.9472 0.8267 0.6488 0.4544 0.2824 0.1543 0.0730 
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Tables 
P 
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 
12 4 0.9995 0.9887 0.9429 0.8424 0.6900 0.5103 0.3361 0.1938 
5 1.0000 0.9982 0.9858 0.9456 0.8613 0.7291 0.5622 0.3872 
6 1.0000 0.9998 0.9973 0.9857 0.9522 0.8822 0.7675 0.6128 
T 1.0000 1.0000 0.9996 0.9972 0.9876 0.9610 0.9043 0.8062 
8 1.0000 1.0000 1.0000 0.9996 0.9977 0.9905 0.9708 0.9270 
9 1.0000 1.0000 1.0000 1.0000 0.9997 0.9984 0.9938 0.9807 
10 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9992 0.9968 
11 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 
12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
13 0 0.4321 0.1762 0.0673 0.0238 0.0077 0.0022 0.0006 0.0001 
1 0.8067 0.5035 0.2690 0.1267 0.0530 0.0195 0.0063 0.0017 
2 0.9565 0.7841 0.5484 0.3326 0.1765 0.0819 0.0329 0.0112 
3 0.9931 0.9310 0.7847 0.5843 0.3824 0.2191 0.1089 0.0461 
4 0.9992 0.9835 0.9211 0.7940 0.6164 0.4248 0.2565 0.1334 
5 0.9999 0.9970 0.9778 0.9198 0.8078 0.6470 0.4633 0.2905 
6 1.0000 0.9996 0.9952 0.9757 0.9238 0.8248 0.6777 0.5000 
7 1.0000 1.0000 0.9992 0.9944 0.9765 0.9315 0.8445 0.7095 
8 1.0000 1.0000 0.9999 0.9990 0.9945 0.9795 0.9417 0.8666 
9 1.0000 1.0000 1.0000 0.9999 0.9991 0.9955 0.9838 0.9539 
10 1.0000 1.0000 1.0000 1.0000 0.9999 0.9993 0.9968 0.9888 
m 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9996 0.9983 
12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 
13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
14 0 0.4051 0.1542 0.0546 0.0178 0.0053 0.0014 0.0003 0.0001 
1 0.7833 0.4626 0.2312 0.1010 0.0388 0.0130 0.0038 0.0009 
2 0.9471 0.7490 0.4960 0.2811 0.1379 0.0585 0.0213 0.0065 
3 0.9908 0.9127 0.7404 0.5213 0.3181 0.1676 0.0756 0.0287 
4 0.9988 0.9970 0.8955 0.7415 0.5432 0.3477 0.1919 0.0898 
5 0.9999 0.9953 0.9671 0.8883 0.7480 0.5637 0.3728 0.2120 
6 1.0000 0.9993 0.9919 0.9167 0.8876 0.7581 0.5839 0.3953 
7 1.0000 0.9999 0.9985 0.9897 0.9601 0.8915 0.7715 0.6047 
8 1.0000 1.0000 0.9998 0.9978 0.9889 0.9615 0.8992 0.7880 
9 1.0000 1.0000 1.0000 0.9997 0.9976 0.9895 0.9654 0.9102 
10 1.0000 1.0000 1.0000 1.0000 0.9996 0.9979 0.9911 0.9713 
11 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9984 0.9935 
12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9991 
13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
15 0 0.3798 0.1349 0.0444 0.0134 0.0036 0.0009 0.0002 0.0000 
1 0.7596 0.4241 0.1981 0.0802 0.0283 0.0087 0.0023 0.0005 
2 0.9369 0.7132 0.4463 0.2361 0.1069 0.0415 0.0136 0.0037 
3 0.9881 0.8922 0.6946 0.4613 0.2618 0.1267 0.0518 0.0176 
4 0.9983 0.9689 0.8665 0.6865 0.4729 0.2801 0.1410 0.0592 
5 0.9998 0.9930 0.9537 0.8516 0.6840 0.4827 0.2937 0.1509 
6 1.0000 0.9988 0.9873 0.9434 0.8435 0.6852 0.4916 0.3036 
7 1.0000 0.9998 0.9972 0.9827 0.9374 0.8415 0.6894 0.5000 
8 1.0000 1.0000 0.9995 0.9958 0.9799 0.9352 0.8433 0.6964 
9 1.0000 1.0000 0.9999 0.9992 0.9949 0.9790 0.9364 0.8491 
10 1.0000 1.0000 1.0000 0.9999 0.9990 0.9947 0.9799 0.9408 
11 1.0000 1.0000 1.0000 1.0000 0.9999 0.9990 0.9952 0.9824 
12 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9992 0.9963 
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Table 1 (continued) p 


n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 


15 13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9995 
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 

16 0 0.3561 0.1181 0.0361 0.0100 0.0025 0.0005 0.0001 0.0000 

1 0.7359 0.3879 0.1693 0.0635 0.0206 0.0057 0.0014 0.0003 
2 0.9258 0.6771 0.3998 0.1971 0.0824 0.0292 0.0086 0.0021 
3 0.9849 0.8698 0.6480 0.4050 0.2134 0.0947 0.0351 0.0106 
4 0.9977 0.9593 0.8342 0.6302 0.4069 0.2226 0.1020 0.0384 
5 0.9997 0.9900 0.9373 0.8103 0.6180 0.4067 0.2269 0.1051 
6 1.0000 0.9981 0.9810 0.9204 0.7940 0.6093 0.4050 0.2272 
7 1.0000 0.9997 0.9954 0.9729 0.9082 0.7829 0.6029 0.4018 
8 1.0000 1.0000 0.9991 0.9925 0.9666 0.9001 0.7760 0.5982 
9 1.0000 1.0000 0.9999 0.9984 0.9902 0.9626 0.8957 0.7728 
10 1.0000 1.0000 1.0000 0.9997 0.9977 0.9888 0.9609 0.8949 
11 1.0000 1.0000 1.0000 1.0000 0.9996 0.9974 0.9885 0.9616 
12 1.0000 1.0000 1.0000 1.0000 0.9999 0.9995 0.9975 0.9894 
13 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9996 0.9979 
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 
15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
16 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
17 0 0.3338 0.1033 0.0293 0.0075 0.0017 0.0003 0.0001 0.0000 
1 0.7121 0.3542 0.1443 0.0501 0.0149 0.0038 0.0008 0.0001 
2 0.9139 0.6409 0.3566 0.1637 0.0631 0.0204 0.0055 0.0012 
3 0.9812 0.8457 0.6015 0.3530 0.1724 0.0701 0.0235 0.0064 
4 0.9969 0.9482 0.7993 0.5739 0.3464 0.1747 0.0727 0.0245 
5 0.9996 0.9862 0.9180 0.7653 0.5520 0.3377 0.1723 0.0717 
6 1.0000 0.9971 0.9728 0.8929 0.7390 0.5333 0.3271 0.1662 
7 1.0000 0.9995 0.9927 0.9598 0.8725 0.7178 0.5163 0.3145 
8 1.0000 0.9999 0.9984 0.9876 0.9484 0.8561 0.7002 0.5000 
9 1.0000 1.0000 0.9997 0.9969 0.9828 0.9391 0.8433 0.6855 
10 1.0000 1.0000 1.0000 0.9994 0.9954 0.9790 0.9323 0.8338 
11 1.0000 1.0000 1.0000 0.9999 0.9990 0.9942 0.9764 0.9283 
12 1.0000 1.0000 1.0000 1.0000 0.9998 0.9987 0.9935 0.9755 
13 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9987 0.9936 
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9988 
15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 
16 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
18 0 0.3130 0.0904 0.0238 0.0056 0.0012 0.0002 0.0000 0.0000 
1 0.6885 0.3228 0.1227 0.0395 0.0108 0.0025 0.0005 0.0001 
2 0.9013 0.6051 0.3168 0.1353 0.0480 0.0142 0.0034 0.0007 
3 0.9770 0.8201 0.5556 0.3057 0.1383 0.0515 0.0156 0.0038 
4 0.9959 0.9354 0.7622 0.5187 0.2920 0.1355 0.0512 0.0154 
5 0.9994 0.9814 0.8958 0.7175 0.4878 0.2765 0.1287 0.0481 
6 0.9999 0.9957 0.9625 0.8610 0.6806 0.4600 0.2593 0.1189 
7 1.0000 0.9992 0.9889 0.9431 0.8308 0.6486 0.4335 0.2403 
8 1.0000 0.9999 0.9973 0.9807 0.9247 0.8042 0.6198 0.4073 
9 1.0000 1.0000 0.9995 0.9946 0.9721 0.9080 0.7807 0.5927 
10 1.0000 1.0000 0.9999 0.9988 0.9915 0.9640 0.8934 0.7597 
11 1.0000 1.0000 1.0000 0.9998 0.9979 0.9885 0.9571 0.8811 
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Table 1 (continued) 





Tables 
P 
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 
18 12 1.0000 1.0000 1.0000 1.0000 0.9996 0.9970 0.9860 0.9519 
13 1.0000 1.0000 1.0000 1.0000 0.9999 0.9994 0.9964 0.9846 
14 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9993 0.9962 
15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9993 
16 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 
17 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
19 0 0.2934 0.0791 0.0193 0.0042 0.0008 0.0001 0.0000 0.0000 
1 0.6650 0.2938 0.1042 0.0310 0.0078 0.0016 0.0003 0.0000 
2 0.8880 0.5698 0.2804 0.1113 0.0364 0.0098 0.0021 0.0004 
3 0.9722 0.7933 0.5108 0.2631 0.1101 0.0375 0.0103 0.0022 
4 0.9947 0.9209 0.7235 0.4654 0.2440 0.1040 0.0356 0.0096 
5 0.9992 0.9757 0.8707 0.6678 0.4266 0.2236 0.0948 0.0318 
6 0.9999 0.9939 0.9500 0.8251 0.6203 0.3912 0.2022 0.0835 
7 1.0000 0.9988 0.9840 0.9225 0.7838 0.5779 0.3573 0.1796 
8 1.0000 0.9998 0.9957 0.9713 0.8953 0.7459 0.5383 0.3238 
9 1.0000 1.0000 0.9991 0.9911 0.9573 0.8691 0.7103 0.5000 
10 1.0000 1.0000 0.9998 0.9977 0.9854 0.9430 0.8441 0.0672 
11 1.0000 1.0000 1.0000 0.9995 0.9959 0.9793 0.9292 0.8204 
12 1.0000 1.0000 1.0000 0.9999 0.9990 0.9938 0.9734 0.9165 
13 1.0000 1.0000 1.0000 1.0000 0.9998 0.9985 0.9919 0.9682 
14 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9980 0.9904 
15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9996 0.9978 
16 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9996 
17 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
18 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
20 0 0.2751 0.0692 0.0157 0.0032 0.0006 0.0001 0.0000 0.0000 
1 0.6148 0.2669 0.0883 0.0243 0.0056 0.0011 0.0002 0.0000 
2 0.8741 0.5353 0.2473 0.0913 0.0275 0.0067 0.0013 0.0002 
3 0.9670 0.7653 0.4676 0.2252 0.0870 0.0271 0.0067 0.0013 
4 0.9933 0.9050 0.6836 0.4148 0.2021 0.0790 0.0245 0.0059 
5 0.9989 0.9688 0.8431 0.6172 0.3695 0.1788 0.0689 0.0207 
6 0.9999 0.9916 0.9351 0.7858 0.5598 0.3284 0.1552 0.0577 
7 1.0000 0.9981 0.9776 0.8982 0.7327 0.5079 0.2894 0.1316 
8 1.0000 0.9997 0.9935 0.9591 0.8605 0.6829 0.4591 0.2517 
9 1.0000 0.9999 0.9984 0.9861 0.9379 0.8229 0.6350 0.4119 
10 1.0000 1.0000 0.9997 0.9961 0.9766 0.9153 0.7856 0.5881 
11 1.0000 1.0000 0.9999 0.9991 0.9926 0.9657 0.8920 0.7483 
12 1.0000 1.0000 1.0000 0.9998 0.9981 0.9884 0.9541 0.8684 
18 1.0000 1.0000 1.0000 1.0000 0.9996 0.9968 0.9838 0.9423 
14 1.0000 1.0000 1.0000 1.0000 0.9999 0.9993 0.9953 0.9793 
15 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9989 0.9941 
16 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9987 
17 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 
18 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
19 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
21 0 0.2579 0.0606 0.0128 0.0024 0.0004 0.0001 0.0000 0.0000 
1 0.6189 0.2422 0.0747 0.0190 0.0040 0.0007 0.0001 0.0000 
2 0.8596 0.5018 0.2175 0.0745 0.0206 0.0046 0.0008 0.0001 
3 0.9612 0.7366 0.4263 0.1917 0.0684 0.0195 0.0044 0.0007 
4 0.9917 0.8875 0.6431 0.3674 0.1662 0.0596 0.0167 0.0036 
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Table 1 (continued) p 


n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 


21 5 0.9986 0.9609 0.8132 0.5666 0.3172 0.1414 0.0495 0.0133 
6 0.9998 0.9888 0.9179 0.7436 0.5003 0.2723 0.1175 0.0392 
7 1.0000 0.9973 0.9696 0.8701 0.6787 0.4405 0.2307 0.0946 
8 1.0000 0.9995 0.9906 0.9439 0.8206 0.6172 0.3849 0.1917 
9 1.0000 0.9999 0.9975 0.9794 0.9137 0.7704 0.5581 0.3318 

10 1.0000 1.0000 0.9995 0.9936 0.9645 0.8806 0.7197 0.5000 
11 1.0000 1.0000 0.9999 0.9983 0.9876 0.9468 0.8454 0.6682 
12 1.0000 1.0000 1.0000 0.9996 0.9964 0.9799 0.9269 0.8083 
13 1.0000 1.0000 1.0000 0.9999 0.9991 0.9936 0.9708 0.9054 
14 1.0000 1.0000 1.0000 1.0000 0.9998 0.9983 0.9903 0.9605 
15 1.0000 1.0000 1.0000 1.0000 1.0000 0.9996 0.9974 0.9867 
16 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9994 0.9964 
17 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9993 
18 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 
19 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
20 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 

22 0 0.2418 0.0530 0.0104 0.0018 0.0003 0.0000 0.0000 0.0000 
1 0.5963 0.2195 0.0631 0.0149 0.0029 0.0005 0.0001 0.0000 
2 0.8445 0.4693 0.1907 0.0606 0.0154 0.0031 0.0005 0.0001 
3 0.9548 0.7072 0.3871 0.1624 0.0535 0.0139 0.0028 0.0004 
4 0.9898 0.8687 0.6024 0.3235 0.1356 0.0445 0.0133 0.0022 
5 0.9981 0.9517 0.7813 0.5168 0.2700 0.1107 0.0352 0.0085 
6 0.9997 0.9853 0.8983 0.6994 0.4431 0.2232 0.0877 0.0267 
7 1.0000 0.9963 0.9599 0.8385 0.6230 0.3774 0.1812 0.0669 
8 1.0000 0.9992 0.9866 0.9254 0.7762 0.5510 0.3174 0.1431 
9 1.0000 0.9999 0.9962 0.9705 0.8846 0.7130 0.4823 0.2617 

10 1.0000 1.0000 0.9991 0.9900 0.9486 0.8393 0.6490 0.4159 
11 1.0000 1.0000 0.9998 0.9971 0.9804 0.9220 0.7904 0.5841 
12 1.0000 1.0000 1.0000 0.9993 0.9936 0.9675 0.8913 0.7383 
13 1.0000 1.0000 1.0000 0.9999 0.9982 0.9885 0.9516 0.8569 
14 1.0000 1.0000 1.0000 1.0000 0.9996 0.9966 0.9818 0.9331 
15 1.0000 1.0000 1.0000 1.0000 0.9999 0.9991 0.9943 0.9739 
16 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9985 0.9915 
17 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9978 
18 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9995 
19 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 
20 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 

23 0 0.2266 0.0464 0.0084 0.0013 0.0002 0.0000 0.0000 0.0000 
1 0.5742 0.1987 0.0532 0.0116 0.0021 0.0003 0.0000 0.0000 
2 0.8290 0.4381 0.1668 0.0492 0.0115 0.0021 0.0003 0.0000 
3 0.9479 0.6775 0.3503 0.1370 0.0416 0.0099 0.0018 0.0002 
4 0.9876 0.8485 0.5621 0.2832 0.1100 0.0330 0.0076 0.0013 
5 0.9976 0.9413 0.7478 0.4685 0.2280 0.0859 0.0247 0.0053 
6 0.9996 0.9811 0.8763 0.6537 0.3890 0.1810 0.0647 0.0173 
7 1.0000 0.9949 0.9484 0.8037 0.5668 0.3196 0.1403 0.0466 
8 1.0000 0.9988 0.9816 0.9037 0.7283 0.4859 0.2578 0.1050 
9 1.0000 0.9998 0.9944 0.9592 0.8507 0.6522 0.4102 0.2024 

10 1.0000 1.0000 0.9986 0.9851 0.9286 0.7919 0.5761 0.3388 
11 1.0000 1.0000 0.9997 0.9954 0.9705 0.8910 0.7285 0.5000 
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Table 1 (continued) 





Tables 
P 

n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 
23 12 1.0000 1.0000 0.9999 0.9988 0.9895 0.9504 0.8471 0.6612 
13 1.0000 1.0000 1.0000 0.9997 0.9968 0.9806 0.9252 0.7976 
14 1.0000 1.0000 1.0000 0.9999 0.9992 0.9935 0.9686 0.8950 
15 1.0000 1.0000 1.0000 1.0000 0.9998 0.9982 0.9888 0.9534 
16 1.0000 1.0000 1.0000 1.0000 1.0000 0.9996 0.9967 0.9827 
17 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9992 0.9947 
18 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9987 
19 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 
20 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
21 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
24 0 0.2125 0.0406 0.0069 0.0010 0.0001 0.0000 0.0000 0.0000 
1 0.5524 0.1797 0.0448 0.0090 0.0015 0.0002 0.0000 0.0000 
2 0.8131 0.4082 0.1455 0.0398 0.0086 0.0014 0.0002 0.0000 
3 0.9405 0.6476 0.3159 0.1150 0.0322 0.0070 0.0011 0.0001 
4 0.9851 0.8271 0.5224 0.2466 0.0886 0.0243 0.0051 0.0008 
5 0.9970 0.9297 0.7130 0.4222 0.1911 0.0661 0.0172 0.0033 
6 0.9995 0.9761 0.8522 0.6074 0.3387 0.1453 0.0472 0.0113 
7 0.9999 0.9932 0.9349 0.7662 0.5112 0.2676 0.1072 0.0320 
8 1.0000 0.9983 0.9754 0.8787 0.6778 0.4235 0.2064 0.0758 
9 1.0000 0.9997 0.9920 0.9453 0.8125 0.5898 0.3435 0.1537 
10 1.0000 0.9999 0.9978 0.9787 0.9043 0.7395 0.5035 0.2706 
11 1.0000 1.0000 0.9995 0.9928 0.9574 0.8538 0.6618 0.4194 
12 1.0000 1.0000 0.9999 0.9979 0.9835 0.9281 0.7953 0.5806 
13 1.0000 1.0000 1.0000 0.9995 0.9945 0.9693 0.8911 0.7294 
14 1.0000 1.0000 1.0000 0.9999 0.9984 0.9887 0.9496 0.8463 
15 1.0000 1.0000 1.0000 1.0000 0.9996 0.9964 0.9799 0.9242 
16 1.0000 1.0000 1.0000 1.0000 0.9999 0.9990 0.9932 0.9680 
17 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9981 0.9887 
18 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9996 0.9967 
19 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9992 
20 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 
21 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
22 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
25 0 0.1992 0.0355 0.0056 0.0008 0.0001 0.0000 0.0000 0.0000 
1 0.5132 0.1623 0.0377 0.0070 0.0011 0.0001 0.0000 0.0000 
2 0.7968 0.3796 0.1266 0.0321 0.0064 0.0010 0.0001 0.0000 
3 0.9325 0.6176 0.2840 0.0962 0.0248 0.0049 0.0007 0.0001 
4 0.9823 0.8047 0.4837 0.2137 0.0710 0.0178 0.0033 0.0005 
5 0.9962 0.9169 0.6772 0.3783 0.1591 0.0504 0.0119 0.0028 
6 0.9993 0.9703 0.8261 0.5611 0.2926 0.1156 0.0341 0.0073 
T 0.9999 0.9910 0.9194 0.7265 0.4573 0.2218 0.0810 0.0216 
8 1.0000 0.9977 0.9678 0.8506 0.6258 0.3651 0.1630 0.0539 
9 1.0000 0.9995 0.9889 0.9287 0.7704 0.5275 0.2835 0.1148 
10 1.0000 0.9999 0.9967 0.9703 0.8756 0.6834 0.4335 0.2122 
11 1.0000 1.0000 0.9992 0.9893 0.9408 0.8110 0.5926 0.3450 
12 1.0000 1.0000 0.9998 0.9966 0.9754 0.9003 0.7369 0.5000 
13 1.0000 1.0000 1.0000 0.9991 0.9911 0.9538 0.8491 0.6550 
14 1.0000 1.0000 1.0000 0.9998 0.9972 0.9814 0.9240 0.7878 
15 1.0000 1.0000 1.0000 1.0000 0.9992 0.9935 0.9667 0.8852 
16 1.0000 1.0000 1.0000 1.0000 0.9998 0.9981 0.9874 0.9462 
17 1.0000 1.0000 1.0000 1.0000 1.0000 0.9995 0.9960 0.9784 
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Table 1 (continued) S 
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 


25 18 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9989 0.9927 
19 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9980 
20 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9995 
21 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 
22 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 
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Table 2 


The Cumulative Poisson 


Distribution 




















Tables 
The tabulated quantity is 
k 
oe 
j=0 

A 
k 0.001 0.005 0.010 0.015 0.020 0.025 
0 0.9990 0050 0.9950 1248 0.9900 4983 0.9851 1194 0.9801 9867 0.9753 099 
1 0.9999 9950 0.9999 8754 0.9999 5033 0.9998 8862 0.9998 0264 0.9996 927 
2 1.0000 0000 0.9999 9998 0.9999 9983 0.9999 9945 0.9999 9868 0.9999 974 
3 1.0000 0000 1.0000 0000 1.0000 0000 0.9999 9999 1.0000 000 
4 1.0000 0000 1.0000 000 

A 
k 0.030 0.035 0.040 0.045 0.050 0.055 
0 0.970 446 0.965 605 0.960 789 0.955 997 0.951 229 0.946 485 
1 0.999 559 0.999 402 0.999 221 0.999 017 0.998 791 0.998 542 
2 0.999 996 0.999 993 0.999 990 0.999 985 0.999 980 0.999 973 
3 1.000 000 1.000 000 1.000 000 1.000 000 1.000 000 1.000 000 

A 
k 0.060 0.065 0.070 0.075 0.080 0.085 
0 0.941 765 0.937 067 0.932 394 0.927 743 0.923 116 0.918 512 
1 0.998 270 0.997 977 0.997 661 0.997 324 0.996 966 0.996 586 
2 0.999 966 0.999 956 0.999 946 0.999 934 0.999 920 0.999 904 
3 0.999 999 0.999 999 0.999 999 0.999 999 0.999 998 0.999 998 
4 1.000 000 1.000 000 1.000 000 1.000 000 1.000 000 1.000 000 

A 
k 0.090 0.095 0.100 0.200 0.300 0.400 
0 0.913 931 0.909 373 0.904 837 0.818 731 0.740 818 0.670 320 
1 0.996 185 0.995 763 0.995 321 0.982 477 0.963 064 0.938 448 
2 0.999 886 0.999 867 0.999 845 0.998 852 0.996 401 0.992 074 
3 0.999 997 0.999 997 0.999 996 0.999 943 0.999 734 0.999 224 
4 1.000 000 1.000 000 1.000 000 0.999 998 0.999 984 0.999 939 
5 1.000 000 0.999 999 0.999 996 
6 1.000 000 1.000 000 

A 
k 0.500 0.600 0.700 0.800 0.900 1.000 
0 0.606 531 0.548 812 0.496 585 0.449 329 0.406 329 0.367 879 
1 0.909 796 0.878 099 0.844 195 0.808 792 0.772 482 0.735 759 
2 0.985 612 0.976 885 0.965 858 0.952 577 0.937 143 0.919 699 
3 0.998 248 0.996 642 0.994 247 0.990 920 0.986 541 0.981 012 
4 0.999 828 0.999 606 0.999 214 0.998 589 0.997 656 0.996 340 
5 0.999 986 0.999 961 0.999 910 0.999 816 0.999 657 0.999 406 
6 0.999 999 0.999 997 0.999 991 0.999 979 0.999 957 0.999 917 
7 1.000 000 1.000 000 0.999 999 0.999 998 0.999 995 0.999 990 
8 1.000 000 1.000 000 1.000 000 0.999 999 
9 1.000 000 
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A 
k 1.20 1.40 1.60 1.80 2.00 2.50 3.00 3.50 
0 0.3012 0.2466 0.2019 0.1653 0.1353 0.0821 0.0498 0.0302 
1 0.6626 0.5918 0.5249 0.4628 0.4060 0.2873 0.1991 0.1359 
2 0.8795 0.8335 0.7834 0.7306 0.6767 0.5438 0.4232 0.3208 
3 0.9662 0.9463 0.9212 0.8913 0.8571 0.7576 0.6472 0.5366 
4 0.9923 0.9857 0.9763 0.9636 0.9473 0.8912 0.8153 0.7254 
5 0.9985 0.9968 0.9940 0.9896 0.9834 0.9580 0.9161 0.8576 
6 0.9997 0.9994 0.9987 0.9974 0.9955 0.9858 0.9665 0.9347 
7 1.0000 0.9999 0.9997 0.9994 0.9989 0.9958 0.9881 0.9733 
8 1.0000 1.0000 0.9999 0.9998 0.9989 0.9962 0.9901 
9 1.0000 1.0000 0.9997 0.9989 0.9967 
10 0.9999 0.9997 0.9990 
11 1.0000 0.9999 0.9997 
12 1.0000 0.9999 
13 1.0000 
A 
k 4.00 4.50 5.00 6.00 7.00 8.00 9.00 10.00 
0 0.0183 0.0111 0.0067 0.0025 0.0009 0.0003 0.0001 0.0000 
1 0.0916 0.0611 0.0404 0.0174 0.0073 0.0030 0.0012 0.0005 
2 0.2381 0.1736 0.1247 0.0620 0.0296 0.0138 0.0062 0.0028 
3 0.4335 0.3423 0.2650 0.1512 0.0818 0.0424 0.0212 0.0103 
4 0.6288 0.5321 0.4405 0.2851 0.1730 0.0996 0.0550 0.0293 
5 0.7851 0.7029 0.6160 0.4457 0.3007 0.1912 0.1157 0.0671 
6 0.8893 0.8311 0.7622 0.6063 0.4497 0.3134 0.2068 0.1301 
7 0.9489 0.9134 0.8666 0.7440 0.5987 0.4530 0.3239 0.2202 
8 0.9786 0.9597 0.9319 0.8472 0.7291 0.5925 0.4577 0.3328 
9 0.9919 0.9829 0.9682 0.9161 0.8305 0.7166 0.5874 0.4579 
10 0.9972 0.9933 0.9863 0.9574 0.9015 0.8159 0.7060 0.5830 
11 0.9991 0.9976 0.9945 0.9799 0.9467 0.8881 0.8030 0.6968 
12 0.9997 0.9992 0.9980 0.9912 0.9730 0.9362 0.8758 0.7916 
13 0.9999 0.9997 0.9993 0.9964 0.9872 0.9658 0.9261 0.8645 
14 1.0000 0.9999 0.9998 0.9986 0.9943 0.9827 0.9585 0.9165 
15 1.0000 0.9999 0.9995 0.9976 0.9918 0.9780 0.9513 
16 1.0000 0.9998 0.9990 0.9963 0.9889 0.9730 
17 0.9999 0.9996 0.9984 0.9947 0.9857 
18 1.0000 0.9999 0.9993 0.9976 0.9928 
19 0.9997 0.9989 0.9965 
20 1.0000 0.9999 0.9996 0.9984 
21 1.0000 0.9998 0.9993 
22 0.9999 0.9997 
23 1.0000 0.9999 
24 1.0000 
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Table 3 The tabulated quantity is 
A 1 se 2 
= 1/2 
The Normal Distribution A J M 
[b(—a) = 1— &@)]. 
x (x) x (x) x (x) x (x) 


0.00 0.500000 0.45 0.673645 0.90 0.815940 1.35 0.911492 
0.01 0.503989 0.46 0.677242 0.91 0.818589 1.36 0.913085 
0.02 0.507978 0.47 0.680822 0.92 0.821214 1.37 0.914657 
0.03 0.511966 0.48 0.684386 0.93 0.823814 1.38 0.916207 
0.04 0.515953 0.49 0.687933 0.94 0.826391 1.39 0.917736 
0.05 0.519939 0.50 0.691462 0.95 0.828944 1.40 0.919243 
0.06 0.523922 0.51 0.694974 0.96 0.831472 1.41 0.920730 
0.07 0.527903 0.52 0.698468 0.97 0.833977 1.42 0.922196 
0.08 0.531881 0.53 0.701944 0.98 0.836457 1.43 0.923641 
0.09 0.535856 0.54 0.705401 0.99 0.838913 1.44 0.925066 
0.10 0.539828 0.55 0.708840 1.00 0.841345 1.45 0.926471 
0.11 0.543795 0.56 0.712260 1.01 0.843752 1.46 0.927855 
0.12 0.547758 0.57 0.715661 1.02 0.846136 1.47 0.929219 
0.13 0.551717 0.58 0.719043 1.03 0.848495 1.48 0.930563 
0.14 0.555670 0.59 0.722405 1.04 0.850830 1.49 0.931888 


0.15 0.559618 0.60 0.725747 1.05 0.853141 1.50 0.933193 
0.16 0.563559 0.61 0.279069 1.06 0.855428 1.51 0.934478 
0.17 0.567495 0.62 0.732371 1.07 0.857690 1.52 0.935745 


0.18 0.571424 0.63 0.735653 1.08 0.859929 1.53 0.936992 
0.19 0.575345 0.64 0.738914 1.09 0.862143 1.54 0.938220 
0.20 0.579260 0.65 0.742154 1:10 0.864334 1.55 0.939429 
0.21 0.583166 0.66 0.745373 1.11 0.866500 1.56 0.940620 
0.22 0.587064 0.67 0.748571 1.12 0.868643 1.57 0.941792 
0.23 0.590954 0.68 0.751748 1.13 0.870762 1.58 0.942947 
0.24 0.594835 0.69 0.754903 1.14 0.872857 1.59 0.944083 
0.25 0.598706 0.70 0.758036 1.15 0.874928 1.60 0.945201 
0.26 0.602568 0.71 0.761148 1.16 0.876976 1.61 0.946301 
0.27 0.606420 0.72 0.764238 1.17 0.879000 1.62 0.947384 
0.28 0.610261 0.73 0.767305 1.18 0.881000 1.63 0.948449 
0.29 0.614092 0.74 0.770350 1.19 0.882977 1.64 0.949497 
0.30 0.617911 0.75 0.773373 1.20 0.884930 1.65 0.950529 


0.31 0.621720 0.76 0.776373 1.21 0.886861 1.66 0.951543 
0.32 0.625516 0.77 0.779350 1.22 0.888768 1.67 0.952540 
0.33 0.629300 0.78 0.782305 1.23 0.890651 1.68 0.953521 


0.34 0.633072 0.79 0.785236 1.24 0.892512 1.69 0.954486 
0.35 0.636831 0.80 0.788145 1.25 0.894350 1.70 0.955435 
0.36 0.640576 0.81 0.791030 1.26 0.896165 1.71 0.956367 
0.37 0.644309 0.82 0.793892 1.27 0.897958 1.72 0.957284 
0.38 0.648027 0.83 0.796731 1.28 0.899727 1.73 0.958185 
0.39 0.651732 0.84 0.799546 1.29 0.901475 1.74 0.959070 
0.40 0.655422 0.85 0.802337 1.30 0.903200 1.75 0.959941 
0.41 0.659097 0.86 0.805105 1.31 0.904902 1.76 0.960796 
0.42 0.662757 0.87 0.807850 1.32 0.906582 1.77 0.961636 
0.43 0.666402 0.88 0.810570 1.33 0.908241 1.78 0.962462 
0.44 0.670031 0.89 0.813267 1.34 0.909877 1.79 0.963273 


Table 3 (continued) 
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(x) x (x) x (x) x (x) 
1.80 0.964070 2.30 0.989276 2.80 0.997445 3.30 0.999517 
1.81 0.964852 2.31 0.989556 2.81 0.997523 3.31 0.999534 
1.82 0.965620 2.32 0.989830 2.82 0.997599 3.32 0.999550 
1.83 0.966375 2.33 0.990097 2.83 0.997673 3.33 0.999566 
1.84 0.967116 2.34 0.990358 2.84 0.997744 3.34 0.999581 
1.85 0.967843 2.35 0.990613 2.85 0.997814 3.35 0.999596 
1.86 0.968557 2.36 0.990863 2.86 0.997882 3.36 0.999610 
1.87 0.969258 2.37 0.991106 2.87 0.997948 3.37 0.999624 
1.88 0.969946 2.38 0.991344 2.88 0.998012 3.38 0.999638 
1.89 0.970621 2.39 0.991576 2.89 0.998074 3.39 0.999651 
1.90 0.971283 2.40 0.991802 2.90 0.998134 3.40 0.999663 
1.91 0.971933 2.41 0.992024 2.91 0.998193 3.41 0.999675 
1.92 0.972571 2.42 0.992240 2.92 0.998250 3.42 0.999687 
1.93 0.973197 2.43 0.992451 2.93 0.998305 3.43 0.999698 
1.94 0.973810 2.44 0.992656 2.94 0.998359 3.44 0.999709 
1.95 0.974412 2.45 0.992857 2.95 0.998411 3.45 0.999720 
1.96 0.975002 2.46 0.993053 2.96 0.998462 3.46 0.999730 
1.97 0.975581 2.47 0.993244 2.97 0.998511 3.47 0.999740 
1.98 0.976148 2.48 0.993431 2.98 0.998559 3.48 0.999749 
1.99 0.976705 2.49 0.993613 2.99 0.998605 3.49 0.999758 
2.00 0.977250 2.50 0.993790 3.00 0.998650 3.50 0.999767 
2.01 0.977784 2.51 0.993963 3.01 0.998694 3.51 0.999776 
2.02 0.978308 2.52 0.994132 3.02 0.998736 3.52 0.999784 
2.03 0.978822 2.53 0.994297 3.03 0.998777 3.53 0.999792 
2.04 0.979325 2.54 0.994457 3.04 0.998817 3.54 0.999800 
2.05 0.979818 2.55 0.994614 3.05 0.998856 3.55 0.999807 
2.06 0.980301 2.56 0.994766 3.06 0.998893 3.56 0.999815 
2.07 0.980774 2.57 0.994915 3.07 0.998930 3.57 0.999822 
2.08 0.981237 2.58 0.995060 3.08 0.998965 3.58 0.999828 
2.09 0.981691 2.59 0.995201 3.09 0.998999 3.59 0.999835 
2.10 0.982136 2.60 0.995339 3.10 0.999032 3.60 0.999841 
2.11 0.982571 2.61 0.995473 3.11 0.999065 3.61 0.999847 
2.12 0.982997 2.62 0.995604 3.12 0.999096 3.62 0.999853 
2.13 0.983414 2.63 0.995731 3.13 0.999126 3.63 0.999858 
2.14 0.983823 2.64 0.995855 3.14 0.999155 3.64 0.999864 
2.15 0.984222 2.65 0.995975 3.15 0.999184 3.65 0.999869 
2.16 0.984614 2.66 0.996093 3.16 0.999211 3.66 0.999874 
2.17 0.984997 2.67 0.996207 3.17 0.999238 3.67 0.999879 
2.18 0.985371 2.68 0.996319 3.18 0.999264 3.68 0.999883 
2.19 0.985738 2.69 0.996427 3.19 0.999289 3.69 0.999888 
2.20 0.986097 2.70 0.996533 3.20 0.999313 3.70 0.999892 
2.21 0.986447 2.71 0.996636 3.21 0.999336 3.71 0.999896 
2.22 0.986791 2.72 0.996736 3.22 0.999359 3.72 0.999900 
2.23 0.987126 2.73 0.996833 3.23 0.999381 3.73 0.999904 
2.24 0.987455 2.74 0.996928 3.24 0.999402 3.74 0.999908 
2.25 0.987776 2.75 0.997020 3.25 0.999423 3.75 0.999912 
2.26 0.988089 2.76 0.997110 3.26 0.999443 3.76 0.999915 
2.27 0.988396 2.77 0.997197 3.27 0.999462 3.77 0.999918 
2.28 0.988696 2.78 0.997282 3.28 0.999481 3.78 0.999922 
2.29 0.988989 2.79 0.997365 3.29 0.999499 3.79 0.999925 
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(x) x (x) x (x) x (x) 
3.80 0.999928 3.85 0.999941 3.90 0.999952 3.95 0.999961 
3.81 0.999931 3.86 0.999943 3.91 0.999954 3.96 0.999963 
3.82 0.999933 3.87 0.999946 3.92 0.999956 3.97 0.999964 
3.83 0.999936 3.88 0.999948 3.93 0.999958 3.98 0.999966 
3.84 0.999938 3.89 0.999950 3.94 0.999959 3.99 0.999967 


Table 4 


Critical Values for 
Student's ¢-Distribution 


Appendix 
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Let £, be a random variable having the Student's ¢-distribution with r degrees of freedom. Then 
the tabulated quantities are the numbers x for which 





Py <x)= y. 
A 

r 0.75 0.90 0.95 0.975 0.99 0.995 

1 1.0000 3.0777 6.3138 12.7062 31.8207 63.6574 
2 0.8165 1.8856 2.9200 4.3027 6.9646 9.9248 
3 0.7649 1.6377 2.3534 3.1824 4.5407 5.8409 
4 0.7407 1.5332 2.1318 2.7764 3.7649 4.6041 
5 0.7267 1.4759 2.0150 2.5706 3.3649 4.0322 
6 0.7176 1.4398 1.9432 2.4469 3.1427 3.7074 
7 0.7111 1.4149 1.8946 2.3646 2.9980 3.4995 
8 0.7064 1.3968 1.8595 3.3060 2.8965 3.9554 
9 0.7027 1.3830 1.8331 2.2622 2.8214 3.2498 
10 0.6998 1.3722 1.8125 2.2281 2.7638 3.1693 
11 0.6974 1.3634 1.7959 2.2010 2.7181 3.1058 
12 0.6955 1.3562 1.7823 2.1788 2.6810 3.0545 
13 0.6938 1.3502 1.7709 1.1604 2.6503 3.0123 
14 0.6924 1.3450 1.7613 2.1448 2.6245 2.9768 
15 0.6912 1.3406 1.7531 2.1315 2.6025 2.9467 
16 0.6901 1.3368 1.7459 2.1199 2.5835 2.9208 
17 0.6892 1.3334 1.7396 2.1098 2.5669 2.8982 
18 0.6884 1.3304 1.7341 2.1009 2.5524 2.8784 
19 0.6876 1.3277 1.7291 2.0930 2.5395 2.8609 
20 0.6870 1.3253 1.7247 2.0860 2.5280 2.8453 
21 0.6864 1.3232 1.7207 2.0796 2.5177 2.8314 
22 0.6858 1.3212 1.7171 2.0739 2.5083 2.8188 
23 0.6853 1.3195 1.7139 2.0687 2.4999 2.8073 
24 0.6848 1.3178 1.7109 2.0639 2.4922 2.7969 
25 0.6844 1.3163 1.7081 2.0595 2.4851 2.7874 
26 0.6840 1.3150 1.7056 2.0555 2.4786 2.7787 
27 0.6837 1.3137 1.7033 2.0518 2.4727 2.7707 
28 0.6834 1.3125 1.7011 2.0484 2.4671 2.7633 
29 0.6830 1.3114 1.6991 2.0452 2.4620 2.7564 
30 0.6828 1.3104 1.6973 2.0423 2.4573 2.7500 
31 0.6825 1.3095 1.6955 2.0395 2.4528 2.7440 
32 0.6822 1.3086 1.6939 2.0369 2.4487 2.7385 
33 0.6820 1.3077 1.6924 2.0345 2.4448 2.1333 
34 0.6818 1.3070 1.6909 2.0322 2.4411 2.7284 
35 0.6816 1.3062 1.6896 2.0301 2.4377 2.7238 
36 0.6814 1.3055 1.6883 2.0281 2.4345 2.7195 
37 0.6812 1.3049 1.6871 2.0262 2.4314 1.7154 
38 0.6810 1.3042 1.6860 2.0244 2.4286 2.7116 
39 0.6808 1.3036 1.6849 2.0227 2.4258 2.7079 
40 0.6807 1.3031 1.6839 2.0211 2.4233 2.7045 
41 0.6805 1.3025 1.6829 2.0195 2.4208 2.7012 
42 0.6804 1.3020 1.6820 2.0181 2.4185 2.6981 
43 0.6802 1.3016 1.6811 2.0167 2.4163 2.6951 
44 0.6801 1.3011 1.6802 2.0154 2.4141 2.6923 
45 0.6800 1.3006 1.6794 2.0141 2.4121 2.6896 
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r 0.75 0.90 0.95 0.975 0.99 0.995 
46 0.6799 1.3002 1.6787 2.0129 2.4102 2.6870 
47 0.6797 1.2998 1.6779 2.0117 2.4083 2.6846 
48 0.6796 1.2994 1.6772 2.0106 2.4066 2.6822 
49 0.6795 1.2991 1.6766 2.0096 2.4069 2.6800 
50 0.6794 1.2987 1.6759 2.0086 2.4033 2.6778 
51 0.6793 1.2984 1.6753 2.0076 2.4017 2.6757 
52 0.6792 1.2980 1.6747 2.0066 2.4002 2.6737 
53 0.6791 1.2977 1.6741 2.0057 2.3988 2.6718 
54 0.6791 1.2974 1.6736 2.0049 2.3974 2.6700 
55 0.6790 1.2971 1.6730 2.0040 2.3961 2.6682 
56 0.6789 1.2969 1.6725 2.0032 2.3948 2.6665 
57 0.6788 1.2966 1.6720 2.0025 2.3936 2.6649 
58 0.6787 1.2963 1.6716 2.0017 2.3924 2.6633 
59 0.6787 1.2961 1.6711 2.0010 2.3912 2.6618 
60 0.6786 1.2958 1.6706 2.0003 2.3901 2.6603 
61 0.6785 1.2956 1.6702 1.9996 2.3890 2.6589 
62 0.6785 1.2954 1.6698 1.9990 2.3880 2.6575 
63 0.6784 1.2951 1.6694 1.9983 2.3870 2.6561 
64 0.6783 1.2949 1.6690 1.9977 2.3860 2.6549 
65 0.6783 1.2947 1.6686 1.9971 2.3851 2.6536 
66 0.6782 1.2945 1.6683 1.9966 2.3842 2.6524 
67 0.6782 1.2943 1.6679 1.9960 2.3833 2.6512 
68 0.6781 1.2941 1.6676 1.9955 2.3824 2.6501 
69 0.6781 1.2939 1.6672 1.9949 2.3816 2.6490 
70 0.6780 1.2938 1.6669 1.9944 2.3808 2.6479 
71 0.6780 1.2936 1.6666 1.9939 2.3800 2.6469 
72 0.6779 1.2934 1.6663 1.9935 2.3793 2.6459 
73 0.6779 1.2933 1.6660 1.9930 2.3785 2.6449 
74 0.6778 1.2931 1.6657 1.9925 2.3778 2.6439 
75 0.6778 1.2929 1.6654 1.9921 2.3771 2.6430 
76 0.6777 1.2928 1.6652 1.9917 2.3764 2.6421 
77 0.6777 1.2926 1.6649 1.9913 2.3758 2.6412 
78 0.6776 1.2925 1.6646 1.9908 2.3701 2.6403 
79 0.6776 1.2924 1.6644 1.9905 2.3745 2.6395 
80 0.6776 1.2922 1.6641 1.9901 2.3739 2.6387 
81 0.6775 1.2921 1.6639 1.9897 2.3733 2.6379 
82 0.6775 1.2920 1.6636 1.9893 2.3727 2.6371 
83 0.6775 1.2918 1.6634 1.9890 2.3721 2.6364 
84 0.6774 1.2917 1.6632 1.9886 2.3716 2.6356 
85 0.6774 1.2916 1.6630 1.9883 2.3710 2.6349 
86 0.6774 1.2915 1.6628 1.9879 2.3705 2.6342 
87 0.6773 1.2914 1.6626 1.9876 2.3700 2.6335 
88 0.6773 1.2912 1.6624 1.9873 2.3695 2.6329 
89 0.6773 1.2911 1.6622 1.9870 2.3690 2.6322 
90 0.6772 1.2910 1.6620 1.9867 2.3685 2.6316 


Table 5 


Critical Values for the 


Chi-Square Distribution 
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Let x2 be a random variable having the chi-square distribution with r degrees of freedom. Then 
the tabulated quantities are the numbers x for which 





POF 2n=y. 
ae 
r 0.005 0.01 0.025 0.05 0.10 0.25 
1 — — 0.001 0.004 0.016 0.102 
2 0.010 0.020 0.051 0.103 0.211 0.575 
3 0.072 0.115 0.216 0.352 0.584 1.213 
4 0.207 0.297 0.484 0.711 1.064 1.923 
5 0.412 0.554 0.831 1.145 1.610 2.675 
6 0.676 0.872 1.237 1.635 2.204 3.455 
7 0.989 1.239 1.690 2.167 2.833 4.255 
8 1.344 1.646 2.180 2.733 3.490 5.071 
9 1.735 2.088 2.700 2.325 4.168 5.899 
10 2.156 2.558 3.247 3.940 4.865 6.737 
11 2.603 3.053 3.816 4.575 5.578 7.584 
12 3.074 3.571 4.404 5.226 6.304 9.438 
13 3.565 4.107 5.009 5.892 7.042 9.299 
14 4.075 4.660 5.629 6.571 7.790 10.165 
15 4.601 5.229 6.262 7.261 8.547 11.037 
16 5.142 5.812 6.908 7.962 9.312 11.912 
17 5.697 6.408 7.564 8.672 10.085 12.792 
18 6.265 7.015 8.231 8.390 10.865 13.675 
19 6.844 7.633 8.907 10.117 11.651 14.562 
20 7.434 8.260 9.591 10.851 12.443 15.452 
21 8.034 8.897 10.283 11.591 13.240 16.344 
22 8.643 9.542 10.982 12.338 14.042 17.240 
23 9.260 10.196 11.689 13.091 14.848 18.137 
24 9.886 10.856 12.401 13.848 15.659 19.037 
25 10.520 11.524 13.120 14.611 16.473 19.939 
26 11.160 12.198 13.844 13.379 17.292 20.843 
27 11.808 12.879 14.573 16.151 18.114 21.749 
28 12.461 13.565 15.308 16.928 18.939 22.657 
29 13.121 14.257 16.047 17.708 19.768 23.567 
30 13.787 14.954 16.791 18.493 20.599 24.478 
31 14.458 15.655 17.539 19.281 21.434 25.390 
32 15.134 16.362 18.291 20.072 22.271 26.304 
33 15.815 17.074 19.047 20.867 23.110 27.219 
34 16.501 17.789 19.806 21.664 23.952 28.136 
35 17.192 18.509 20.569 22.465 24.797 29.054 
36 17.887 19.233 21.336 23.269 25.643 29.973 
37 18.586 19.960 22.106 24.075 26.492 30.893 
38 19.289 20.691 22.878 24.884 27.343 31.815 
39 19.996 21.426 23.654 25.695 28.196 32.737 
40 20.707 22.164 24.433 26.509 29.051 33.660 
41 21.421 22.906 25.215 27.326 29.907 34.585 
42 22.138 23.650 25.999 28.144 30.765 35.510 
43 22.859 24.398 26.785 28.965 31.625 36.436 
44 23.584 25.148 27.575 29.787 32.487 37.363 
45 24.311 25.901 28.366 30.612 33.350 38.291 
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r 0.75 0.90 0.95 0.975 0.99 0.995 
1 1.323 2.706 3.841 5.024 6.635 7.879 
2 2.773 4.605 5.991 7.378 9.210 10.597 
3 4.108 6.251 7.815 9.348 11.345 12.838 
4 5.385 7.779 9.488 11.143 13.277 14.860 
5 6.626 9.236 11.071 12.833 15.086 16.750 
6 7.841 10.645 12.592 14.449 16.812 18.548 
7 9.037 12.017 14.067 16.013 18.475 20.278 
8 10.219 13.362 15.507 17.535 20.090 21.955 
9 11.389 14.684 16.919 19.023 21.666 23.589 
10 12.549 15.987 18.307 20.483 23.209 25.188 
11 13.701 17.275 19.675 21.920 24.725 26.757 
12 14.845 18.549 21.026 23.337 26.217 28.299 
13 15.984 19.812 23.362 24.736 27.688 29.819 
14 17.117 21.064 23.685 26.119 29.141 31.319 
15 18.245 22.307 24.996 27.488 30.578 32.801 
16 19.369 23.542 26.296 28.845 32.000 34.267 
17 20.489 24.769 27.587 30.191 33.409 35.718 
18 21.605 25.989 28.869 31.526 34.805 37.156 
19 22.718 27.204 30.144 32.852 36.191 38.582 
20 23.828 28.412 31.410 34.170 37.566 39.997 
21 24.935 29.615 32.671 35.479 38.932 41.401 
22 26.039 30.813 33.924 36.781 40.289 42.796 
23 27.141 32.007 35.172 38.076 41.638 44.181 
24 28.241 33.196 36.415 39.364 42.980 45.559 
25 29.339 34.382 37.652 40.646 44.314 46.928 
26 30.435 35.563 38.885 41.923 45.642 48.290 
27 31.528 36.741 40.113 43.194 46.963 49.645 
28 32.620 37.916 41.337 44.641 48.278 50.993 
29 33.711 39.087 42.557 45.722 49.588 52.336 
30 34.800 40.256 43.773 46.979 50.892 53.672 
31 35.887 41.422 44.985 48.232 51.191 55.003 
32 36.973 42.585 46.194 49.480 53.486 56.328 
33 38.058 43.745 47.400 50.725 54.776 57.648 
34 39.141 44.903 48.602 51.966 56.061 58.964 
35 40.223 46.059 49.802 53.203 57.342 60.275 
36 41.304 47.212 50.998 54.437 58.619 61.581 
or 42.383 48.363 52.192 55.668 59.892 62.883 
38 43.462 49.513 53.384 56.896 61.162 64.181 
39 44.539 50.660 54.572 58.120 62.428 65.476 
40 45.616 51.805 55.758 59.342 63.691 66.766 
41 46.692 52.949 56.942 60.561 64.950 68.053 
42 47.766 54.090 58.124 61.777 66.206 69.336 
43 48.840 55.230 59.304 62.990 67.459 70.616 
44 49.913 56.369 60.481 64.201 68.710 71.893 
45 50.985 57.505 61.656 65.410 69.957 73.166 
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Let F.; n, be a random variable having the F-distribution with 71, r2 degrees of freedom. Then 
the tabulated quantities are the numbers x for which 


PUP ry <a“)=y. 





Y 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


1 


1.0000 
5.8285 
39.864 
161.45 
647.79 
4052.2 

16211 


0.66667 
2.5714 
8.5623 
18.513 
38.506 
98.503 
198.50 


0.58506 
2.0239 
5.5383 
10.128 
17.443 
34.116 
55.552 


0.54863 
1.8074 
4.5448 
7.7086 
12.218 
21.198 
31.333 


0.52807 
1.6925 
4.0604 
6.6079 
10.007 
16.258 
22.785 


0.51489 
1.6214 
3.7760 
5.9874 
8.8131 
13.745 
18.635 


2 


1.5000 
7.5000 
49.500 
199.50 
799.50 
4999.5 

20000 


1.0000 
3.0000 
9.0000 
19.000 
39.000 
99.000 
199.00 


0.88110 
2.2798 
5.4624 
9.5521 
16.044 
30.817 
49.799 


0.82843 
2.0000 
4.3246 
6.9443 
10.649 
18.000 
26.284 


0.79877 
1.8528 
3.7797 
5.7861 
8.4336 
13.274 
18.314 


0.77976 
1.7622 
3.4633 
5.1433 
7.2598 
10.925 
14.544 


rı 
3 


1.7092 
8.1999 
53.593 
215.71 
864.16 
5403.3 
21615 


1.1349 
3.1534 
9.1618 
19.164 
39.165 
99.166 
199.17 


1.0000 
2.3555 
5.3908 
9.2766 
15.439 
29.457 
47.467 


0.94054 
2.0467 
4.1908 
6.5914 
9.9792 
16.694 
24.259 


0.90715 
1.8843 
3.6195 
5.4095 
7.7636 
12.060 
16.530 


0.88578 
1.7844 
3.2888 
4.7571 
6.5988 
9.7795 
12.917 


4 


1.8227 
8.5810 
55.833 
224.58 
899.58 
5624.6 
22500 


1.2071 
3.2320 
9.2434 
19.247 
39.248 
99.249 
199.25 


1.0632 
2.3901 
5.3427 
9.1172 
15.101 
28.710 
46.195 


1.0000 
2.0642 
4.1073 
6.3883 
9.6045 
15.977 
23.155 


0.96456 
1.8927 
3.5202 
5.1922 
7.3879 
11.392 
15.556 


0.94191 
1.7872 
3.1808 
4.5337 
6.2272 
9.1483 
12.028 


5 


1.8937 
8.8198 
57.241 
230.16 
921.85 
5763.7 
23056 


1.2519 
3.2799 
9.2926 
19.296 
39.298 
99.299 
199.30 


1.1024 
2.4095 
5.3092 
9.0135 
14.885 
28.237 
45.392 


1.0367 
2.0723 
4.0506 
6.2560 
9.3645 
15.522 
22.456 


1.0000 
1.8947 
3.4530 
5.0503 
7.1464 
10.967 
14.940 


0.97654 
1.7852 
3.1075 
4.3874 
5.9876 
8.7459 
11.464 


1.9422 
8.9833 
58.204 
233.99 
937.11 
5859.0 
23437 


1.2824 
3.3121 
9.3255 
19.330 
39.331 
99.332 
199.33 


1.1289 
2.4218 
5.2847 
8.9406 
14.735 
27.911 
44.838 


1.0617 
2.0766 
4.0098 
6.1631 
9.1973 
15.207 
21.975 


1.0240 
1.8945 
3.4045 
4.9503 
6.9777 
10.672 
14.513 


1.0000 
1.7821 
3.0546 
4.2839 
5.8197 
8.4661 
11.073 


Y 


0.500 
0.750 
0.900 
0.950 1 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 2 
0.975 
0.990 
0.995 


0.500 

0.750 

0.900 

0.950 3 
0.975 

0.990 

0.995 Ya 


0.500 
0.750 
0.900 
0.950 4 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 5 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 6 
0.975 
0.990 
0.995 
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Table 6 (continued) 





Tables 
ry 
y 7 8 9 10 11 12 y 
0.500 1.9774 2.0041 2.0250 2.0419 2.0558 2.0674 0.500 
0.750 9.1021 9.1922 9.2631 9.3202 9.3672 9.4064 0.750 
0.900 58.906 59.439 59.858 60.195 60.473 60.705 0.900 
i; 0.950 236.77 238.88 240.54 241.88 242.99 243.91 0.950 
0.975 948.22 956.66 963.28 968.63 973.04 976.71 0.975 
0.990 5928.3 5981.1 6022.5 6055.8 6083.3 6106.3 0.990 
0.995 23715 23925 24091 24224 24334 24426 0.995 
0.500 1.3045 1.3213 1.3344 1.3450 1.3537 1.3610 0.500 
0.750 3.3352 3.3526 3.3661 3.3770 3.3859 3.3934 0.750 
0.900 9.3491 9.3668 9.3805 9.3916 9.4006 9.4081 0.900 
2 0.950 19.353 19.371 19.385 19.396 19.405 19.413 0.950 
0.975 39.355 39.373 39.387 39.398 39.407 39.415 0.975 
0.990 99.356 99.374 99.388 99.399 99.408 99.416 0.990 
0.995 199.36 199.37 199.39 199.40 199.41 199.42 0.995 
0.500 1.1482 1.1627 1.1741 1.1833 1.1909 1.1972 0.500 
0.750 2.4302 2.4364 2.4410 2.4447 2.4476 2.4500 0.750 
0.900 5.2662 5.2517 5.2400 5.2304 5.2223 5.2156 0.900 
3 0.950 8.8868 8.8452 8.8123 8.7855 8.7632 8.7446 0.950 
0.975 14.624 14.540 14.473 14.419 14.374 14.337 0.975 
0.990 27.672 27.489 27.345 27.229 27.132 27.052 0.990 
Y 0.995 44,434 44.126 43.882 43.686 43.523 43.387 0.995 Y) 
0.500 1.0797 1.0933 1.1040 1.1126 1.1196 1.1255 0.500 
0.750 2.0790 2.0805 2.0814 2.0820 2.0823 2.0826 0.750 
0.900 3.9790 3.9549 3.9357 3.9199 3.9066 3.8955 0.900 
4 0.950 6.0942 6.0410 5.9988 5.9644 5.9357 5.9117 0.950 
0.975 9.0741 8.9796 8.9047 8.8439 8.7933 8.7512 0.975 
0.990 14.976 14.799 14.659 14.546 14.452 14.374 0.990 
0.995 21.622 21.352 21.139 20.967 20.824 20.705 0.995 
0.500 1.0414 1.0545 1.0648 1.0730 1.0798 1.0855 0.500 
0.750 1.8935 1.8923 1.8911 1.8899 1.8887 1.8877 0.750 
0.900 3.3679 3.3393 3.3163 3.2974 3.2815 3.2682 0.900 
5, 0.950 4.8759 4.8183 4.7725 4.7351 4.7038 4.6777 0.950 
0.975 6.8531 6.7572 6.6810 6.6192 6.5676 6.5246 0.975 
0.990 10.456 10.289 10.158 10.051 9.9623 9.8883 0.990 
0.995 14.200 13.961 13.772 13.618 13.490 13.384 0.995 
0.500 1.0169 1.0298 1.0398 1.0478 1.0545 1.0600 0.500 
0.750 1.7789 1.7760 1.7733 1.7708 1.7686 1.7668 0.750 
0.900 3.0145 2.9830 2.9577 2.9369 2.9193 2.9047 0.900 
6 0.950 4.2066 4.1468 4.0990 4.0600 4.0272 3.9999 0.950 
0.975 5.6955 5.5996 5.5234 5.4613 5.4094 5.3662 0.975 
0.990 8.2600 8.1016 7.9761 7.8741 7.7891 7.7183 0.990 
0.995 10.786 10.566 10.391 10.250 10.132 10.034 0.995 


Table 6 (continued) 





Appendix 469 
ri 
y 13 14 15 18 20 24 y 

0.500 2.0773 2.0858 2.0931 2.1104 2.1190 2.1321 0.500 
0.750 9.43899 9.4685 9.4934 9.5520 9.5813 9.6255 0.750 
0.900 60.903 61.073 61.220 61.567 61.740 62.002 0.900 

1 0.950 24469 245.37 245.95 247.32 248.01 249.05 0.950 1 
0.975 979.85 982.54 984.87 990.36 993.10 997.25 0.975 
0.990 6125.9 6142.7 6157.3 6191.6 6208.7 62346 0.990 
0.995 24504 24572 24630 24767 24836 24940 0.995 
0.500 1.3672 1.8725 13771 1.879 1.3933 1.4014 0.500 
0.750 3.3997 3.4051 3.4098 3.4208 3.4263 3.4345 0.750 
0.900 9.4145 9.4200 9.4247 9.4358 9.4413 9.4496 0.900 

2 0.950 19.419 19.424 19.429 19440 19446 19.454 0.950 2 
0.975 39.421 39.426 39.431 39.442 39.448 39.456 0.975 
0.990 99.422 99.427 99.432 99.443 99.449 99.458 0.990 
0.995 199.42 19943 19943 199.44 19945 199.46 0.995 
0.500 1.2025 1.2071 1.2111 1.2205 1.2252 1.2322 0.500 
0.750 2.4520 2.4537 2.4552 2.4585 2.4602 2.4626 0.750 
0.900 5.2097 5.2047 5.2003 5.1898 5.1845 5.1764 0.900 

3 0.950 8.7286 8.7148 8.7029 8.6744 8.6602 8.6385 0.950 3 
0.975 14305 14277 14.253 14196 14.167 14124 0.975 
0.990 26.983 26.923 26.872 26.751 26.690 26.598 0.990 

Y 0.995 43.271 43.171 43.085 42.880 42.778 42.622 0.955 T2 

0.500 1.1305 1.1349 1.1386 1.1473 1.1517 1.1583 0.500 
0.750 2.0827 2.0828 2.0829 2.0828 2.0828 2.0827 0.750 
0.900 3.8853 3.8765 3.8689 3.8525 3.8443 3.8310 0.900 

4 0.950 5.8910 5.8732 5.8578 5.8209 5.8025 5.7744 0.950 4 
0.975 8.7148 8.6836 8.6565 85921 8.5599 8.5109 0.975 
0.990 14306 14248 14198 14079 14020 13.929 0.990 
0.995 20.602 20.514 20.4388 20.257 20.167 20.030 0.995 
0.500 1.0903 1.0944 1.0980 1.1064 1.1106 1.1170 0.500 
0.750 1.8867 1.8858 1.8851 1.8830 1.8820 1.8802 0.750 
0.900 3.2566 3.2466 3.2380 3.2171 3.2067 3.1905 0.900 

5 0.950 4.6550 4.6356 46188 4.5783 4.5581 45272 0.950 5 
0.975 6.4873 6.4554 64277 63616 6.3285 6.2780 0.975 
0.990 9.8244 9.7697 9.7222 9.6092 9.5527 9.4665 0.990 
0.995 13.292 13.214 13.146 12.984 12.903 12.780 0.995 
0.500 1.0647 1.0687 1.0722 1.0804 1.0845 1.0907 0.500 
0.750 1.7650 1.7634 1.7621 1.7586 1.7569 1.7540 0.750 
0.900 2.8918 2.8808 2.8712 2.8479 2.8363 2.8183 0.900 

6 0.950 3.9761 3.9558 3.93881 3.8955 3.8742 3.8415 0.950 6 
0.975 5.3287 5.2966 5.2687 5.2018 5.1684 5.1172 0.975 
0.990 7.6570 7.6045 7.5590 7.4502 7.3958 7.3127 0.990 
0.995 9.9494 9.8769 9.8140 9.6639 9.5888 9.4741 0.995 
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Table 6 (continued) 





Tables 
ry 
y 30 40 48 60 120 00 y 
0.500 2.1452 2.1584 2.1650 2.1716 2.1848 2.1981 0.500 
0.750 9.6698 9.7144 9.7368 9.7591 9.8041 9.8492 0.750 
0.900 62.265 62.529 62.662 62.794 63.061 63.328 0.990 
i; 0.950 250.09 251.14 251.67 252.20 253.25 254.32 0.950 
0.975 1001.4 1005.6 1007.7 1009.8 1014.0 1018.3 0.975 
0.990 6260.7 6286.8 6299.9 6313.0 6339.4 6366.0 0.990 
0.995 25044 25148 25201 25253 25359 25465 0.995 
0.500 1.4096 1.4178 1.4220 1.4261 1.4344 1.4427 0.500 
0.750 3.4428 3.4511 3.4553 3.4594 3.4677 3.4761 0.750 
0.900 9.4579 9.4663 9.4705 9.4746 9.4829 9.4913 0.900 
2 0.950 19.462 19.471 19.475 19.479 19.487 19.496 0.950 
0.975 39.465 39.473 39.477 39.481 39.490 39.498 0.975 
0.990 99.466 99.474 99.478 99.483 99.491 99.499 0.990 
0.995 199.47 199.47 199.47 199.48 199.49 199.51 0.995 
0.500 1.2393 1.2464 1.2500 1.2536 1.2608 1.2680 0.500 
0.750 2.4650 2.4674 2.4686 2.4697 2.4720 2.4742 0.750 
0.900 5.1681 5.1597 5.1555 5.1512 5.1425 5.1337 0.900 
3 0.950 8.6166 8.5944 8.5832 8.5720 8.5494 8.5265 0.950 
0.975 14.081 14.037 14.015 13.992 13.947 13.902 0.975 
0.990 26.505 26.411 26.364 26.316 26.221 26.125 0.990 
Y 0.995 42.466 42.308 42.229 42.149 41.989 41.829 0.995 Y) 
0.500 1.1649 1.1716 1.1749 1.1782 1.1849 1.1916 0.500 
0.750 2.0825 2.0821 2.0819 2.0817 2.0812 2.0806 0.750 
0.900 3.8174 3.8036 3.7966 3.7896 3.7753 3.7607 0.900 
4 0.950 5.7459 5.7170 5.7024 5.6878 5.6581 5.6281 0.950 
0.975 8.4613 8.4111 8.3858 8.3604 8.3092 8.2573 0.975 
0.990 13.838 13.745 13.699 13.652 13.558 13.463 0.990 
0.995 19.892 19.752 19.682 19.611 19.468 19.325 0.995 
0.500 1.1234 1.1297 1.1329 1.1361 1.1426 1.1490 0.500 
0.750 1.8784 1.8763 1.8753 1.8742 1.8719 1.8694 0.750 
0.900 3.1741 3:1573 3.1488 1.1402 3.1228 3.1050 0.900 
5, 0.950 4.4957 4.4638 4.4476 4.4314 4.3984 4.3650 0.950 
0.975 6.2269 6.1751 6.1488 6.1225 6.0693 6.0153 0.975 
0.990 9.3793 9.2912 9.2466 9.2020 9.1118 0.0204 0.990 
0.995 12.656 12.530 12.466 12.402 12.274 12.144 0.995 
0.500 1.0969 1.1031 1.1062 1.1093 1.1156 1.1219 0.500 
0.750 1.7510 1.7477 1.7460 1.7443 1.7407 1.7368 0.750 
0.900 2.8000 2.1812 2.7716 2.7620 2.7423 2.1222 0.900 
6 0.950 3.8082 3.7743 3.7571 3.7398 3.7047 3.6688 0.950 
0.975 5.0652 5.0125 4.9857 4.9589 4.9045 4.9491 0.975 
0.990 7.2285 7.1432 7.1000 7.0568 6.9690 6.8801 0.990 
0.995 9.3583 9.2408 9.1814 9.1219 9.0015 8.8793 0.995 
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Ya 


10 


11 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


1 


0.50572 
1.5732 
3.5894 
5.5914 
8.0727 
12.246 
16.236 


0.49898 
1.5384 
3.4579 
5.3177 
7.5709 
11.259 
14.688 


0.49382 
1.5121 
3.3603 
5.1174 
7.2093 
10.561 
13.614 


0.48973 
1.4915 
3.2850 
4.9646 
6.9367 
10.044 
12.826 


0.48644 
1.4749 
3.2252 
4.8443 
6.7241 
9.6460 
12.226 


0.48369 
1.4613 
3.1765 
4.7472 
6.5538 
9.3302 
11.754 


2 


0.76655 
1.7010 
3.2574 
4.7374 
6.5415 
9.5466 
12.404 


0.75683 
1.6569 
3.1131 
4.4590 
6.0595 
8.6491 
11.042 


0.74938 
1.6236 
3.0065 
4.2565 
5.7147 
8.0215 
10.107 


0.74349 
1.5975 
2.9245 
4.1028 
5.4564 
7.5594 
9.4270 


0.73872 
1.5767 
2.8595 
3.9823 
5.2559 
7.2057 
8.9122 


0.73477 
1.5595 
2.8068 
3.8853 
5.0959 
6.9266 
8.5096 


3 


0.87095 
1.7169 
3.0741 
4.3468 
5.8898 
8.4513 
10.882 


0.86004 
1.6683 
2.9238 
4.0662 
5.4160 
7.5910 
9.5965 


0.85168 
1.6315 
2.8129 
3.8626 
5.0781 
6.9919 
8.7171 


0.84508 
1.6028 
2.7277 
3.7083 
4.8256 
6.5523 
8.0807 


0.83973 
1.5798 
2.6602 
3.5874 
4.6300 
6.2167 
7.6004 


0.83530 
1.5609 
2.6055 
3.4903 
4.4742 
5.9526 
7.2258 


rı 
4 


0.92619 
1.7157 
2.9605 
4.1203 
5.5226 
7.8467 
10.050 


0.91464 
1.6642 
2.8064 
3.8378 
5.0526 
7.0060 
8.8051 


0.90580 
1.6253 
2.6927 
3.6331 
4.7181 
6.4221 
7.9559 


0.89882 
1.5949 
2.6053 
3.4780 
4.4683 
5.9943 
7.3428 


0.89316 
1.5704 
2.5362 
3.3567 
4.2751 
5.6683 
6.8809 


0.88848 
1.5503 
2.4801 
3.2592 
4.1212 
5.4119 
6.5211 


5 


0.96026 
1:7111 
2.8833 
3.9715 
5.2852 
7.4604 
9.5221 


0.94831 
1.6575 
2.7265 
3.6875 
4.8173 
6.6318 
8.3018 


0.93916 
1.6170 
2.6106 
3.4817 
4.4844 
6.0569 
7.4711 


0.93193 
1.5853 
2.5216 
3.3258 
4.2361 
5.6363 
6.8723 


0.92608 
1.5598 
2.4512 
3.2039 
4.0440 
5.3160 
6.4217 


0.92124 
1.5389 
2.3940 
3.1059 
3.8911 
5.0643 
6.0711 


6 


0.98334 
1.7059 
2.8274 
3.8660 
5.1186 
7.1914 
9.1554 


0.97111 
1.6508 
2.6683 
3.5806 
4.6517 
6.3707 
7.9520 


0.96175 
1.6091 
2.5509 
3.3738 
4.3197 
5.8018 
7.1338 


0.95436 
1.5765 
2.4606 
3.2172 
4.0721 
5.3858 
6.5446 


0.94837 
1.5502 
2.3891 
3.0946 
3.8807 
5.0692 
6.1015 


0.94342 
1.5286 
2.3310 
2.9961 
3.7283 
4.8206 
5.7570 


y 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


0.500 
0.750 
0.900 
0.950 
0.975 
0.990 
0.995 


10 


11 


12 


Y 
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Table 6 (continued) 





Tables 
ri 

y 7 8 9 10 11 12 y 
0.500 1.0000 1.0216 1.0224 1.0304 1.0369 1.0423 0.500 
0.750 1.7011 1.6969 1.6931 1.6898 1.6868 1.6843 0.750 
0.900 2.7849 2.7516 2.7247 2.7025 2.6837 2.6681 0.900 

7 0.950 3.7870 3.7257 3.6767 3.6365 3.6028 3.5747 0.950 7 
0.975 4.9949 4.8994 4.8232 4.7611 4.7091 4.6658 0.975 
0.990 6.9928 6.8401 6.7188 6.6201 6.5377 6.4691 0.990 
0.995 8.8854 8.6781 8.5138 8.3803 8.2691 8.1764 0.995 
0.500 0.98757 1.0000 1.0097 1.0175 1.0239 1.0293 0.500 
0.750 1.6448 1.6396 1.6350 1.6310 1.6274 1.6244 0.750 
0.900 2.6241 2.5893 2.5612 2.5380 2.5184 2.5020 0.900 

8 0.950 3.5005 3.4381 3.3881 3.3472 3.3127 3.2840 0.950 8 
0.975 4.5286 4.4332 4.3572 4.2951 4.2431 4.1997 0.975 
0.990 6.1776 6.0289 5.9106 5.8143 5.7338 5.6668 0.990 
0.995 7.6942 7.4960 7.3386 7.2107 7.1039 7.0149 0.995 
0.500 0.97805 0.99037 1.0000 1.0077 1.0141 1.0194 0.500 
0.750 1.6022 1.5961 1.5909 1.5863 1.5822 1.5788 0.750 
0.900 2.5053 2.4694 2.4403 2.4163 2.3959 2.3789 0.900 

9 0.950 3.2927 3.2296 3.1789 3.1373 3.1022 3.0729 0.950 9 
0.975 4.1971 4.1020 4.0260 3.9639 3.9117 3.8682 0.975 
0.990 5.6129 5.4671 5.3911 5.2565 5.1774 5.1114 0.990 

Y 0.995 6.8849 6.6933 6.5411 6.4171 6.3136 6.2274 0.995 Y 

0.500 0.97054 0.98276 0.99232 1.0000 1.0063 1.0166 0.500 
0.750 1.5688 1.5621 1.5563 1.5518 1.5468 1.5430 0.750 
0.900 2.4140 2.3772 2.3473 2.3226 2.3016 2.2841 0.900 

10 0.950 3.13855 3,0717 3.0204 2.9782 2.9426 2.9130 0.950 10 
0.975 3.9498 3.8549 3.7790 3.7168 3.6645 3.6209 0.975 
0.990 5.2001 5.0567 4.9424 4.8492 4.7710 4.7059 0.990 
0.995 6.3025 6.1159 5.9676 5.8467 5.7456 5.6613 0.995 
0.500 0.96445 0.97661 0.98610 0.99373 0.99999 1.0052 0.500 
0.750 1.5418 1.5346 1.5284 1.5230 1.5181 1.5140 0.750 
0.900 2.3416 2.3040 2.2735 2.2482 2.2267 2.2087 0.900 

11 0.950 3.01238 2.9480 2.8962 2.8536 2.8176 2.7876 0.950 11 
0.975 3.7586 3.6638 3.5879 3.5257 3.4733 3.4296 0.975 
0.990 4.8861 4.7445 4.6315 4.5393 4.4619 4.3974 0.990 
0.995 5.8648 5.6821 5.5368 5.4182 5.3190 5.2363 0.995 
0.500 0.95943 0.97152 0.98097 0.98856 0.99480 1.0000 0.500 
0.750 1.5197 1.5120 1.5054 1.4996 1.4945 1.4902 0.750 
0.900 2.2828 2.2446 2.2135 2.1878 1.1658 1.1474 0.900 

12 0.950 2.9134 2.8486 2.7964 2.7534 2.7170 2.6866 0.950 12 
0.975 3.6065 3.5118 3.4358 3.3736 3.3211 3.2773 0.975 
0.990 4.6395 4.4994 4.3875 4.2961 4.2193 4.1553 0.990 
0.995 5.5245 5.3451 5.2021 5.0855 4.9878 4.9063 0.995 


Table 6 (continued) 





Appendix 473 
rı 

y 13 14 15 18 20 24 y 
0.500 1.0469 1.0509 1.0543 1.0624 1.0664 1.0724 0.500 
0.750 1.6819 1.6799 1.6781 1.6735 1.6712 1.6675 0.750 
0.900 2.6543 2.6425 2.6322 2.6072 2.5947 2.5753 0.900 

T 0.950 3.5501 3.5291 3.5108 3.4666 3.4445 3.4105 0.950 7 
0.975 4.6281 4.5958 4.5678 4.5004 4.4667 4.4150 0.975 
0.990 64096 6.3585 6.3143 6.2084 6.1554 6.0743 0.990 
0.995 8.0962 8.0274 7.9678 7.8253 7.7540 7.6450 0.995 
0.500 1.0339 1.0378 1.0412 1.0491 1.0531 1.0591 0.500 
0.750 1.6216 16191 1.6170 1.6115 1.6088 1.6043 0.750 
0.900 2.4875 2.4750 2.4642 2.4378 2.4246 2.4041 0.900 

8 0.950 3.2588 3.2371 3.2184 3.1730 3.1503 3.1152 0.950 8 
0.975 4.1618 4.1293 41012 4.0334 3.9995 3.9472 0.975 
0.990 5.6085 5.5584 5.5151 5.4111 5.3591 5.2793 0.990 
0.995 6.9377 6.8716 6.8143 6.6769 6.6082 6.5029 0.995 
0.500 1.0239 1.0278 1.0311 1.0390 1.0429 1.0489 0.500 
0.750 1.5756 1.5729 1.5705 1.5642 1.5611 1.5560 0.750 
0.900 2.3638 2.3508 2.3396 2.3121 2.9893 2.2768 0.900 

9 0.950 3.0472 3.0252 3.0061 2.9597 2.9365 2.9005 0.950 9 
0.975 3.8302 3.7976 3.7694 3.7011 3.6669 3.6142 0.975 
0.990 5.0540 5.0048 4.9621 4.8594 4.8080 4.7290 0.990 

7 0.995 6.1524 6.0882 6.0325 5.8987 5.8318 5.7292 0.995 n 

0.500 1.0161 1.0199 1.0232 1.0310 1.0349 1.0408 0.500 
0.750 1.5395 1.5364 1.5338 1.5269 1.5235 1.5179 0.750 
0.900 2.2685 2.2551 2.2435 2.2150 2.2007 2.1784 0.900 

10 0.950 2.8868 2.8644 2.8450 2.7977 2.7740 2.7372 0.950 10 
0.975 3.5827 3.5500 3.5217 3.45380 3.4186 3.3654 0.975 
0.990 4.6491 4.6004 4.5582 4.4563 3.4054 3.3269 0.990 
0.995 5.5880 5.5252 5.4707 5.3396 5.2740 5.1732 0.995 
0.500 1.0097 1.0135 1.0168 1.0245 1.0284 1.0343 0.500 
0.750 1.5102 1.5069 1.5041 14967 1.4930 1.4869 0.750 
0.900 2.1927 2.1790 2.1671 2.13877 2.1230 2.1000 0.900 

11 0.950 2.7611 2.7383 2.7186 2.6705 2.6464 2.6090 0.950 11 
0.975 3.3913 3.3584 3.3299 3.2607 3.2261 3.1725 0.975 
0.990 43411 4.2928 42509 4.1496 4.0990 4.0209 0.990 
0.995 5.1642 5.1024 5.0489 4.9198 4.8552 4.7557 0.995 
0.500 1.0044 1.0082 1.0115 1.0192 1.0231 1.0289 0.500 
0.750 1.4861 1.4826 1.4796 14717 1.4678 1.4613 0.750 
0.900 2.1311 2.1170 1.1049 2.0748 2.0597 2.0360 0.900 

12 0.950 2.6598 2.6368 2.6169 2.5680 2.5486 2.5055 0.950 12 
0.975 3.2388 3.2058 3.1772 3.1076 3.0728 3.0187 0.975 
0.990 4.0993 4.0512 4.0096 3.9088 3.8584 3.7805 0.990 
0.995 4.8352 4.7742 4.7214 4.5937 45299 4.4315 0.995 
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Table 6 (continued) 





Tables 
ry 

y 30 40 48 60 120 00 y 
0.500 1.0785 1.0846 1.0877 1.0908 1.0969 1.1031 0.500 
0.750 1.6635 1.6593 1.6571 1.6548 1.6502 1.6452 0.750 
0.900 2.5555 2.5351 2.5427 2.5142 2.4928 2.4708 0.900 

7 0.950 3.3758 3.3404 3.3224 3.3043 3.2674 3.2298 0.950 T 
0.975 4.3624 4.3089 4.2817 4.2544 4.1989 4.1423 0.975 
0.990 5.9921 5.9084 5.8660 5.8236 5.7372 5.6495 0.990 
0.995 7.5345 7.4225 7.3657 7.3088 7.1933 7.0760 0.995 
0.500 1.0651 1.0711 1.0741 1.0771 1.0832 1.0893 0.500 
0.750 1.5996 1.5945 1.5919 1.5892 1.5836 1.5777 0.750 
0.900 2.3830 2.38614 2.3503 2.3391 2.3162 2.2926 0.900 

8 0.950 3.0794 3.0428 3.0241 3.0053 2.9669 2.9276 0.950 8 
0.975 3.8940 3.8398 3.8121 3.7844 3.7279 3.6702 0.975 
0.990 5.1981 5.1156 5.0736 5.0316 4.9460 4.8588 0.990 
0.995 6.3961 6.2875 6.2324 6.1772 6.0649 5.9505 0.995 
0.500 1.0548 1.0608 1.06388 1.0667 1.0727 1.0788 0.500 
0.750 1.5506 1.5450 1.5420 1.5389 1.5325 1.5257 0.750 
0.900 2.2547 2.2320 2.2203 2.2085 2.1843 2.1592 0.900 

9 0.950 2.8637 2.8259 2.8066 2.7872 2.7475 2.7067 0.950 9 
0.975 3.5604 3.5055 3.4774 3.44938 3.3918 3.3329 0.975 
0.990 4.6486 4.5667 4.5249 4.4831 4.3978 4.3105 0.990 

Y 0.995 5.6248 5.5186 5.4645 5.4104 5.3001 5.1875 0.995 Y 

0.500 1.0467 1.0526 1.0556 1.0585 1.0645 1.0705 0.500 
0.750 1.5119 1.5056 1.5023 1.4990 1.4919 1.4843 0.750 
0.900 2.1554 1.1317 2.1195 2.1072 2.0818 2.0554 0.900 

10 0.950 2.6996 2.6609 2.6410 2.6211 2.5801 2.5379 0.950 10 
0.975 33110 3.2554 3.2269 3.1984 3.1399 3.0798 0.975 
0.990 4.2469 4.1653 4.1236 4.0819 3.9965 3.9090 0.990 
0.995 5.0705 4.9659 4.9126 4.8592 4.7501 4.6385 0.995 
0.500 1.0401 1.0460 1.0490 1.0519 1.0578 1.0637 0.500 
0.750 1.4805 1.4737 1.4701 1.4664 1.4587 1.4504 0.750 
0.900 2.0762 2.0516 2.0389 2.0261 1.9997 1.9721 0.900 

11 0.950 2.5705 2.5309 2.5105 2.4901 2.4480 2.4045 0.950 11 
0.975 3.1176 3.0613 3.0324 3.0085 2.9441 2.8828 0.975 
0.990 3.9411 3.8596 3.8179 3.7761 3.6904 3.6025 0.990 
0.995 4.6548 4.5508 4.4979 4.4450 4.8367 4.2256 0.995 
0.500 1.0347 1.0405 1.0485 1.0464 1.0523 1.0582 0.500 
0.750 1.4544 1.4471 1.4432 1.4393 1.4310 1.4221 0.750 
0.900 2.0115 1.9861 1.9729 1.9597 1.93823 1.9036 0.900 

12 0.950 2.4663 2.4259 2.4051 2.3842 2.3410 2.2962 0.950 12 
0.975 2.9633 2.9063 2.8771 2.8478 2.7874 2.7249 0.975 
0.990 3.7008 3.6192 3.5774 3.5355 3.4494 3.3608 0.990 
0.995 4.33809 4.2282 4.1756 4.1229 4.0149 3.9039 0.995 


Table 6 (continued) 





Appendix 475 
ri 

y 1 2 3 4 5 6 y 
0.500 0.48141 0.73145 0.83159 0.88454 0.91718 0.93926 0.500 
0.750 1.4500 1.5452 1.5451 1.5336 1.5214 1.5105 0.750 
0.900 3.1362 2.7632 2.5603 2.4337 2.3467 2.2830 0.900 

13 0.950 4.6672 3.8056 3.4105 3.1791 3.0254 2.9153 0.950 13 
0.975 6.4143 4.9653 4.3472 3.9959 3.7667 3.6043 0.975 
0.990 9.0738 6.7010 5.7394 5.2053 4.8616 4.6204 0.990 
0.995 11.374 8.1865 6.9257 6.2335 5.7910 5.4819 0.995 
0.500 0.47944 0.72862 0.82842 0.88119 0.913871 0.93573 0.500 
0.750 1.4403 1.5331 1.5317 1.5194 1.5066 1.4952 0.750 
0.900 3.1022 2.7265 2.5222 2.3947 2.3069 2.2426 0.900 

14 0.950 4.6001 3.7389 3.3439 3.1122 2.9582 2.8477 0.950 14 
0.975 6.2979 4.8567 4.2417 3.8919 3.6634 3.5014 0.975 
0.990 8.8616 6.5149 5.5639 5.0354 4.6950 4.4558 0.990 
0.995 11.060 7.9216 6.6803 5.9984 5.5623 5.2574 0.995 
0.500 0.47775 0.72619 0.82569 0.87830 0.91073 0.93267 0.500 
0.750 1.4321 1.5227 1.5202 1.5071 1.4938 1.4820 0.750 
0.900 3.0732 2.6952 2.4898 2.3614 2.2730 2.2081 0.900 

15 0.950 4.5431 3.6823 3.2874 3.0556 2.9013 2.7905 0.950 15 
0.975 6.1995 4.7650 4.1528 3.8043 3.5764 3.4147 0.975 
0.990 8.6831 6.3589 5.4170 4.8932 4.5556 4.3183 0.990 

T2 0.995 10.798 7.7008 6.4760 5.8029 5.3721 5.0708 0.995 Y 

0.500 0.47628 0.72406 0.82330 0.87578 0.90812 0.93001 0.500 
0.750 1.4249 1.5137 1.5103 1.4965 1.4827 1.4705 0.750 
0.900 3.0481 2.6682 2.4618 2.3327 2.2438 2.1783 0.900 

16 0.950 4.4940 3.6337 3.2389 3.0069 2.8524 2.7413 0.950 16 
0.975 6.1151 4.6867 4.0768 3.7294 3.5021 3.3406 0.975 
0.990 8.5310 6.2262 5.2922 4.7726 4.4374 4.2016 0.990 
0.995 10.575 7.5138 6.3034 5.6378 5.2117 4.9134 0.995 
0.500 0.47499 0.72219 0.82121 0.87357 0.90584 0.92767 0.500 
0.750 1.4186 1.5057 1.5015 1.4873 1.4730 1.4605 0.750 
0.900 3.0262 2.6446 2.4374 2.3077 2.2183 2.1524 0.900 

17 0.950 4.4513 3.5915 3.1968 2.9647 2.8100 2.6987 0.950 17 
0.975 6.0420 4.6189 4.0112 3.6648 3.4379 3.2767 0.975 
0.990 8.3997 6.1121 5.1850 4.6690 4.3359 4.1015 0.990 
0.995 10.384 7.3536 6.1556 5.4967 5.0746 5.7789 0.995 
0.500 0.47385 0.72053 0.81936 0.87161 0.90381 0.92560 0.500 
0.750 1.4130 1.4988 1.4938 1.4790 1.4644 1.4516 0.750 
0.900 3.0070 2.6239 2.4160 2.2858 2.1958 1.1296 0.900 

18 0.950 4.4139 3.5546 3.1599 2.9277 2.7729 2.6613 0.950 18 
0.975 5.9781 4.5597 3.9539 3.6083 3.3820 3.2209 0.975 
0.990 8.2854 6.0129 5.0919 4.5790 4.2479 4.0146 0.990 
0.995 10.218 7.2148 6.0277 5.3746 4.9560 4.6627 0.995 


(Continued) 


476 


Table 6 (continued) 





Tables 
ry 

y 7 8 9 10 11 12 y 
0.500 0.95520 0.96724 0.97665 0.98421 0.99042 0.99560 0.500 
0.750 1.5011 1.4931 1.4861 1.4801 1.4746 1.4701 0.750 
0.900 2.2341 2.1953 2.1638 1.1376 1.1152 2.0966 0.900 

13 0.950 2.8321 2.7669 2.7144 2.6710 2.6343 2.6037 0.950 13 
0.975 3.4827 3.3880 3.3120 3.2497 3.1971 3.1532 0.975 
0.990 4.4410 4.3021 4.1911 4.1003 4.0239 3.9603 0.990 
0.995 5.2529 5.0761 4.9351 4.8199 4.7234 4.6429 0.995 
0.500 0.95161 0.96360 0.97298 0.98051 0.98670 0.99186 0.500 
0.750 1.4854 1.4770 1.4697 1.4634 1.4577 1.4530 0.750 
0.900 2.1931 2.1539 2.1220 2.0954 2.0727 2.0537 0.900 

14 0.950 2.7642 2.6987 2.6548 2.6021 2.5651 2.5342 0.950 14 
0.975 3.3799 2.2853 3.2093 3.1469 3.0941 3.0501 0.975 
0.990 4.2779 4.1399 4.0297 3.9394 3.8634 3.8001 0.990 
0.995 5.0313 4.8566 4.7173 4.6034 4.5078 4.4281 0.995 
0.500 0.94850 0.96046 0.96981 0.97732 0.98349 0.98863 0.500 
0.750 1.4718 1.4631 1.4556 1.4491 1.4432 1.4883 0.750 
0.900 2.1582 2.1185 2.0862 2.0593 2.0363 2.0171 0.900 

15 0.950 2.7066 2.6408 2.5876 2.5437 2.5064 2.4753 0.950 15 
0.975 3.2934 3.1987 3.1227 3.0602 3.0073 2.9633 0.975 
0.990 4.1415 4.0045 3.8948 3.8049 3.7292 3.6662 0.990 

Y 0.995 4.8473 4.6743 4.5364 4.4236 4.3288 4.2498 0.995 Y) 

0.500 0.94580 0.95773 0.96705 0.97454 0.98069 0.98582 0.500 
0.750 1.4601 1.4511 1.4433 1.4366 1.4305 1.4255 0.750 
0.900 2.1280 2.0880 2.0553 2.0281 2.0048 1.9854 0.900 

16 0.950 2.6572 2.5911 2.5377 2.4935 2.4560 2.4247 0.950 16 
0.975 3.2194 3.1248 3.0488 2.9862 2.9332 2.8890 0.975 
0.990 4.0259 3.8896 3.7804 3.6909 3.6155 3.5527 0.990 
0.995 4.6920 4.5207 4.3838 4.2719 4.1778 4.0994 0.995 
0.500 0.94342 0.95532 0.96462 0.97209 0.97823 0.98334 0.500 
0.750 1.4497 1.4405 1.4325 1.4256 1.4194 1.4142 0.750 
0.900 2.1017 2.0613 2.0284 2.0009 1.9773 1.9577 0.900 

17 0.950 2.6143 2.5480 2.4943 2.4499 2.4122 2.3807 0.950 17 
0.975 3.1556 3.0610 2.9849 2.9222 2.8691 2.8249 0.975 
0.990 3.9267 3.7910 3.6822 3.5931 3.5179 3.4552 0.990 
0.995 4.5594 4.3893 4.2535 4.1423 4.0488 3.9709 0.995 
0.500 0.94132 0.95319 0.96247 0.96993 0.97606 0.98116 0.500 
0.750 1.4406 1.4312 1.4320 1.4159 1.4095 1.4042 0.750 
0.900 2.0785 2.0379 2.0047 1.9770 1.9532 1.9333 0.900 

18 0.950 2.5767 2.5102 2.4563 2.4117 2.3737 2.3421 0.950 18 
0.975 3.0999 3.0053 2.9291 2.8664 2.8132 2.7689 0.975 
0.990 3.8406 3.7054 3.5971 3.5082 3.4331 3.3706 0.990 
0.995 4.4448 4.2759 4.1410 4.0305 3.9374 3.8599 0.995 


These tables have been adapted from Donald B. Owen's Handbook of Statistical Tables, published 
by Addison-Wesley, by permission of the publishers. 
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Table 7 Table of Selected Discrete and Continuous Distributions and Some of Their Characteristics 





Distribution 


Binomial, B(n, p) 


(Bernoulli, B(1, p) 


Geometric 


Poisson, P(A) 


Hypergeometric 


Gamma 


Negative Exponential 


Chi-Square 


Normal, N( 1, 0?) 


(Standard Normal, N(0, 1) 


Uniform, U(a, 3) 


PROBABILITY DENSITY FUNCTIONS IN ONE VARIABLE 


Probability Density Function 
Su) = (") pg", 2 =0,1,...,% 
0<p<l1,q=l1-p 
f(@) = pq’, x=0,1 


SF) = pq™", %=1,2,...; 


0<p<lq=l1-p 


x 


A 
S&O = ezp], 


(lesa) 





S(@) = ——————, Where 
a 
Y 

m 
ell ) -0.7>m) 

Y 
ME — 1 a—1 x . 
Fx) = T@pe” exp Ez) => 0; 


ap>0 

f(x) = Aexp(—Ax), x > 0; à > 0; or 
l- ag 

Si) = e, x>0u>0 
u 


— 
105 


r > 0 integer 


sa) = rite (—),a>0 


2 





ee 1 (x— y 
Fx) = Jina e| 202 | 


xen;ipenh,o>0 








Fu) = : ex _# xeR 
= Jax p 3 y Y H 
S@) = pa ESB; 


—œ <Q < p <o 


Mean 


np 


mr 





m+n 


ap 





Variance 


npq 


x RS 


> 


mnr(m+n-r) 
m+n m+n- 1) 


ap? 


1) 


(a — By 
12 
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Table 7 (continued) 





PROBABILITY DENSITY FUNCTIONS IN MANY VARIABLES 


Distribution Probability Density Function Means Variances 
n! 
Multinomial Sait) = ———_ _ x NPI, ---» Wk MPG, +++, NPkAk- 
a1lxa!--- ag! 
PI D3 >> Ph”, Xi > 0 integers, a=1-pj=1...,k 


14024 +x =n; pj > 0,7 =1, 


2,...,k, pit pot-:-+ pe =1 


2 ¿2 
Oi» % 





Bivariate Normal S (a1, x2) = 


1 q 
exp > H1, 2 
210,091 p? s) 
2 

1 X — H1 wy Mı 
q= I 3 2p 

=P 01 01 

2 
(> E) (+23) | 
Xx T > 
02 02 


x1,%2, E Ñ; u1, u2 ER, 01,02 > 0, —1 < p < 1, p = correlation coefficient 








k-Variate Normal, N(p, ©) =P +43 12% H1, ---> Uk Covariance matrix: > 


1 E 
a| - zE WET p), 
xen; pen N:kxk 


nonsingular symmetric matrix 


Appendix 


Table 7 (continued) 
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Distribution 


Binomial, B(n, p) 
(Bernoulli, B(1, p) 


Geometric 


Poisson, P(A) 


Hypergeometric 


Gamma 
Negative Exponential 


Chi-Square 


Normal, N(p, 0?) 


(Standard Normal, N(0, 1) 


Uniform, U(a, 3) 


Multinomial 


Bivariate Normal 


k-Variate Normal, N(, 2) 


Moment Generating Function 
M(t) = (pë +q),teR 
M(t) = pë +q,t Ee K) 
t 
pe 
M(t) = too" t < —logq 


M(t) = expe —1),t eh 


1 1 

M(t = zz. e = 

O= py p 
A T 1 
M(t) = ——,t < à; or M(t) = ——,t < — 
A=t l— ut H 

1 dl 

MO = — zt < 


(— 20/2’ ~ 2 





ot? 
M(t) = exp (u+ 3 ) tem 


M(t) = exp (5) ten) 


tf _ pta 
M(t) = ten 

(Ba) 
M(t, ..., tk) = (pret +++ pre)", 
ti,- p ER 


M(h, t2) = exp [as + plate 


2 
th, lo ER 


1 ; 
+5 (074 + 2p0103t tb + 28) 


1 
M(t) = exp (eu + seze), 


ten’ 


R 
Ke k>1 
thy 


S 

Ø 

ACB 

AS 

AUB 

ANB 

A-B 

r.v. 

La 

(X e B)=X KB) 
X(S) 

P 

P(A) 

Px 

Fx 

Jx 

P(AIB) 

(e) 

Prk 

n 

EX or u(X) or y y or just y 
Var(X) or 0*(X) or o or just o°? 
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Some Notation 
and Abbreviations 


real line 

k-dimensional Euclidean space 

increasing (nondecreasing) and decreasing (nonincreasing), 
respectively 

sample space; also, sure (or certain) event 

empty set; also, impossible event 

event A is contained in event B (event A implies event B) 

complement of event A 

union of events A and B 

intersection of events A and B 

difference of events A and B (in this order) 

random variable 

indicator of the set A: Ia Œœ) = lifwe A, I4(~) = O0ifad A 

inverse image of the set B under X: X~!(B) = {s e S; X(s) e B} 

range of X 

probability function (measure) 

probability of the event A 

probability distribution of X (or just distribution of X ) 

distribution function (d.f.) of X 

probability density function (p.d.f.) of X 

conditional probability of A, given B 

combinations of n objects taken k at a time 

permutations of n objects taken k at a time 

n factorial 

expectation (mean value, mean) of X 

variance of X 
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J Var(X) or o (X) or ox or just o 


My or just M 
Bin, p) 

PO) 

Y 

N(u, 0?) 

® 

U(a, P) or R(a, B) 
X ~ Bn, p) etc. 
x2 


ane 


Lid. (r.v.'s) 

JxyCl¥Y = y) or fxirCly) 
E(X|Y = y) 

Var(X|Y = y) or o*(X|Y = y) 
Cov(X, Y) 

P(X, Y) or px,y 

tr 

tr: 

Fanm 

Fr rsa 

X( or Y; 


P d q.m. 


=>, > > 


, ’ 


WLLN 
CLT 


O D 


ML 
MLE 
UMV 
UMVU 
LS 
LSE 


Ha 


standard deviation (s.d.) of X 

moment generating function (m.g.f.) of X 

Binomial distribution with parameters n and p 

Poisson distribution with parameter A 

Chi-Square distribution with r degrees of freedom (d.f.) 

Normal distribution with parameters u and o? 

distribution function (d.f.) of the standard N(0, 1) distribution 

Uniform (or Rectangular) distribution with parameters a and 6 

the r.v. X has the distribution indicated 

the point for which P(X > x2) =a, X ~ x? 

the point for which P(Z > 2x) =a, where Z ~ N(O, 1) 

joint probability distribution of the r.v.'s Xj, ..., Xn or probability 
distribution of the random vector X 

joint d.f. of the r.v.'s X,,..., Xn or d.f. of the random vector X 

joint p.d.f. of the rv.’s Xj, ..., Xn or p.d.f. of the random vector X 

joint m.g.f. of the rv.’s Xj, ..., Xn or m.g.f. of the random vector X 

independent identically distributed (r.v.'s) 

conditional p.d.f. of X, given Y = y 

conditional expectation of X, given Y = y 

conditional variance of X, given Y = y 

covariance of X and Y 

correlation coefficient of X and Y 

(Student’s) ¢ distribution with r degrees of freedom (d.f.) 

the point for which P(X > tra) =a, X ~ tr 

F distribution with 7; and 72 degrees of freedom (d.f.) 

the point for which P(X > Fry ya) = a, X ~ Fnr 

jth order statistic of X;,..., Xn 


convergence in probability, distribution, quadratic mean, 
respectively 

Weak Law of Large Numbers 

Central Limit Theorem 

letter used for a one-dimensional parameter 

symbol used for a multidimensional parameter 

letter used for a parameter space 

maximum likelihood 

maximum likelihood estimate 

uniformly minimum variance 

uniformly minimum variance unbiased 

least squares 

least squares estimate 

null hypothesis 

alternative hypothesis 
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o letter used for a test function 

a letter used for level of significance 

B(@) or 6(0) probability of type II error at 9(0) 

(0) or 2(0) power of a test at 0(0) 

MP most powerful (test) 

UMP uniformly most powerful (test) 

LR likelihood ratio 

A= (MN, -.., Ln) likelihood ratio test function 

log x the logarithm of x(>0) with base always e whether it is so explicitly 


stated or not 





Answers to 
Even-Numbered 
Exercises 


Section 1.2 
22 (i) S={(r, 7577.57 b), 6.59) C, b, r), (r, b, b), (Y, b, g), C, 9,7), 
(r, 9, b), (r, 9, 9), (b, Y, r) (b, Y, b), (b, Y, 9), (b, b, Th (b, b, b), 
(0; b, g), (b, g, r), @, g, b), (0, g, 9), (9, r, r), (9, Y, b), (9, T, 9), 
(g, b, r), Cg; b, b), (9, b, 9), (9, 9, Y), (9, 9, b), (9, 9, 9))- 
Gi) A= {(r, b, g), (r, g, b), (b, r, g), ©, g, r), (9, r, b), (9, b, rd), 
B= {(r, r, b), (r, r, g), (r, b, r), (r, b, b), (r, g, 1), (r, 9, 9), (b, r,r), 
(b, Y, b), (b, b, r), (b, b, 9), (b, 9, b), (b, 9, 9), (9, Y, r), 
(9,7, 9), (9, b, b), (9, b, 9), (9, 9, Y), (9, 9, DD}, 
C=AUB=S8-= LG, y, y), (b, b, b), (9, 9, Dm). 


2.4 (i) Denoting by (%, x2) the cars sold in the first and the second sale, we 
have: 


S = ((a1, a), (a1, a2), (a1, as), (a2, 1), (a2, 42), (a2, 43), (a3, a1), 
(as, a2), (as, as), (a1, 01), (a1, b2), (a2, 01), (a2, b2), (az, b1), 
(as, ba), (a1, €), (a2, €), (a3, €), (b1, a1), (b1, a2), (b1, a3), (ba, a1), 
(b2, a2), (Da, as), (b1, b1), (b1, b2), (b2, b1), (Dz, b2), (b1, ©), (Da, €), 
(c, a), (c, a2), (c, a3), (c, bı), (c, b2), (c, Cc). 


(ii) A= (a, a), (a, a2), (1, as), (a2, 41), (a2, 42), (a2, 43), (a3, 41), 
(az, 42), (a3, a3)}, 
B= {(q, b1), (a, b2), (a2, b1), (a2, b2), (a3, b1), (a3, b2)}, 
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C = BU {(b1, a), (b1, 42), (b1, as), (b2, 41), (b2, a2), (b2, a3)}, 
D= {(e, bı), (c, b2), (b1, c), (ba, cy}. 
26 E=A4, F=C-D=CND"G=B-C=BNC", 
H = A — B = A N B° = (AU BF, I = B°. 
28 © Bo = ALN 45N AS. 
Gi) By = (41 N 45 N 45) U (43 N A2 N AS) U (4 N 45 N As). 
Gii) By = (A, N A2 N 45) U (41N 45 N A3) U (AS N A2 N Az). 
(iv) By = Aı N A2 N As. 
(v) C= B U B U Bo. 
(vi) D = B; U B2 U B3 = A; U Ag U As. 
2.10 If A=@, then ANB°=G, ANB=SNB=B, so that (AN BU 
(A°N B) = B for every B. Next, let (AN B°) U (A°N B) = B and take 
B= Ø to obtain AN B° = A, ASN B = Ø, so that A= Ø. 
2.12 A C Bimplies that, for every s e A, we have s e B, whereas B C C implies 


that, for every s € B, we have s e C. Thus, for every s € A, we have 
s € C, so that ACC. 


2.14 For se U; Aj, let jo > 1 be the first j for which s € Ap. Then, if jo = 1, 
it follows that s € A; and therefore s belongs in the right-hand side of 
the relation. If j > 1, then s ¢ A;, j =1,..., jo — 1, but s € Aj, so that 
s € AFN- -N AG-1 N Ap and hence s belongs to the right-hand side of the 
relation. Next, let s belong to the right-hand side event. Then, if s € Aj, 
it follows that s € U;A;. If s ¢ A; for 7 = 1,..., J —1 but s € Aj, it 
follows that s € U¡A;. The identity is established. 


2.16 (i) Since —5 + ati < —5 +2 and 20-14 < 20 — ae it follows that 








(5+4, 20-2) C (545, 20— 47), or An C Any 1, SO that (An) is 
increasing. Likewise, 747 < 7+ Ž, so that (0, 7+, 7) C (0, 7+3), 


or Bn+1 C Bn; thus, {B,} is decreasing. 
(ii) UN An = UN (ES + 120 — L) = (25,20), and NB, = 


ne,(0,7+ 2) = (0, 7]. 


L. Section 1.3 


3.2 Each one of the r.v's X;,7 = 1,2,3 takes on the values: 0, 1, 2,3 and 
Xı + X2 + X =3. 
3.4 X takes on the values: —3, —2, —1, 0, 1, 2, 3, 4, 5, 6, 7, 


(X < 2) = {(-3, 0), (E3, 1), ES, 2), E3, 3), (=3, 4), (=2, 0 
(2, 2), (—2, 3), (2, 4), 41,0), El 1), El 2 
(0, 0), (0, 1), (0, 2), (1, 0), G, D, 2; OI, 


(8< X<5)=(4< X <5)= (X =40r X =5) 
= {(0, 4), (1, 3), (1, 4), @, 2), @, 3), 6, D, 6, 2), 
(X > 6) = (X > 7) = {G, 4)). 
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36 (i) S=(0, D, 41,23, 1,3), d, 4), 2, D, 6,2, (2, 3), (2, 4), 6, 1), (8, 2), 
(8, 3), (8, 4), (4, D, (4, 2), (4, 3), (4, 4}. 
(ii) The values of X are: 2, 3, 4, 5, 6, 7, 8. 
(iii) (X < 3) = (X = 2 or X = 3) = {(1, 1), A, 2), 2, D), 
2<X<5)=@2<X<4=(X% =2orx =30rx =4)= 
{d, b, d, 2), 2, 1), G, 3), 2, 2), 8, D}, (xX > 8) = 
3.8 (i) S = [8:00, 8:15]. 
(ii) The values of X consist of the interval [8:00, 8:15]. 
(iii) The event described is the interval [8:10, 8:15]. 


(MN Section 2.1 


1.2 Since AUB > A, we have P(AUB) > P(A) = 3. Also, ANB C Bimplies 
TLA N B) < P(B)= 3 . Finally, P(A N B) = P(A) + P(B) - P(AU B) = 
3+8 -PUU B=- P(AUB)> $-1=1. 


1.4 We have: Af N B = BN A° = B — A and A C B. Therefore P(A* N B) = 
P(B — A) = P(B)- P(A) = 2-] = į = 0.167. Likewise, 
ANC =C-AwithA c €, so that PUE NC) = = P(C- A) = 
P(C)— P(A) = §-—} = § x 0.333, BNC = C — B with B C C, 
so that P(B" N C) = P(C — B) = P(C) — P(B) = L- $ = $ ~ 0.167. 
Next, AN BNC? = AN (BNC) = AN(BUCY = ANC’ = A-C = Ø, 
so that P(A N B° N C°) = 0, and A°N B° N C° = (AU BUCY = C°, so 
that P(A°N B° N C°) = P(C°) = 1 — P(C) = 1- f= 3 ~ 0.417. 


1.6 The event A is defined as follows: A = “x = Tn, n = 1,..., 28,” so that 
P(A)= = 5=0. 14. Likewise, B = “x = 3n + 10, n= 1,..., 68,” 
so that P(B) = $} = 0.315, and C = “x? +1 < 375” = “a? < 374” = 
“a < /874” = “x < 19,” and then P(C) = 4 = 0.095. 


1.8 Denote by A, B, and C the events that a student reads news magazines 
A, B, and C, respectively. Then the required probability is PCA°N BNC’). 
However, 


P(4 N B° NC’) = P((AU BUOY) = 1— P(AUBUC) 
= 1—[P(A)+ P(B) + P(C) — PAN B) — P(ANC) 
— P(BN C)+ P(AN BNC) 
= 1 — (0.20 + 0.15 + 0.10 — 0.05 — 0.04 — 0.03 + 0.02) 
= 1 — 0.35 = 0.65. 


1.10 From the definition of A, B, and C, we have: 
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A= (0, 4), (0, 6), 1, 3), G, 5), (1, 9), Q, 2), (2, 4), 2, 8), ©, D, 6, 3), 
(3, 7), (4, 0), 4, 2), (4, 6), (5, D, 6, 5), (6, 0), (6, D), 

B= ((0, 0), G, 2), 2, 4), G, 6), (4, 8), 

C = {Q, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 0), 
A, 2), (1, 3), A, 4), (1, 5), Q, 6), (A, 7), (1, 8), (A, 9), (2, 0), @, D, 
(2, 3), (2, 4), 2, 5), 2, 6), @, 7), Q, 8), (2, 9), 6, 0), (3, D, 6, 2), 
(3, 4), 6, 5), 6, 6), G, D, 6, 8), 6, 9), 4,0), G, D, 4,2), 4, 3), 
(4, 5), (4, 6), 4, 7), (4, 8), 4, 9), 6, 0), (5, D, 6, 2), (5, 3), 65, 4), 
(5, 6), (5, 7), 6, 8), (5, 9), (6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), 
(6, 7), (6, 8), (6, 9) 

or 

c° = {(0, 0), G, D, @, 2), (3, 3), (4, 4), 65, 5), (6, ©}. 


Therefore, since the number of points in S is 7 x 10 = 70, we have: 


18 9 5 1 
P(A) == = = 20.257 P(B) = = = — x 0.071 
(4) 70 35 nee (B) 70 14 TNs 
63 9 7 63 
PO = =p or PCJ=1= P=- gA 


L.) Section 2.2 
2.2 (i) For 0 < x < 2, f(x) = £Qc(a? — ix?) = 22x — a?). Thus, 


f(x) =2c(2x— x?), 0 < x < 2 (and 0 elsewhere). 
(ii) From de 2c(2x — x”) da = 1, we get & = 1, so that c = 3/8. 
2.4 (i) 























(ii) P(X < 6.5) = 0.7, P(X > 8.1) = 1 — P(X < 8.1) =1-09=01, 
P(5 < X < 8) = P(X < 8) — P(X < 5) = 0.7 — 0.4 = 0.3. 


2.6 (i) We need two relations which are provided by: IS (cx+d)dx=1and 


Eten + d)dx = 1/3, or: c + 2d = 2 and 9c + 12d = 8, and hence 


=t aE 
c= -3 d= 5. 
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(ii) For0 < x <1, F@) = h $t + $)dt = + %. Thus, 
0, x<0 
Fœ) = 4-2 +8, 0<w<1 
l, x>l. 

2.8 From ca" =c PAoa =c x qh = l, we gete = 1-a. 


2.10 (i) reci a w IT = * = ] and ¢ = 2, 


3 
Gi) PR > 3)=5 IAE y= 5 x YE = = > = 0.037. 
2.12 (i) ff ce “dx= — f de= -ege = — (0-1) = 1 for every c >0. 
(ii) P(X > 10) =f) cede = —e “| = (0 — e 1%) = e7, 
(iii) P(X > 10) = 0.5 implies e 1% = 4, so that -10c = —log2 and 
c = 7 log2 ~ Y ~ 0.069. 


2.14 @ From op =c} 037 =0X E = ¥ = 1, we getc=3. 


.. 1/33 
di) P(X>3)=c DR, y =0x £ T= OX gage =3X gage = ge = gy 50.037. 


(iii) Pe = E 0,1), 8S, deme eta dd +..)= 














ox gh sox gs = 3x 3 = 0.25. 
(iv) P(X = 3k + 1,4 =0,1,..) =c} Pog =+ tito) 
ox HB =cx2=2x2=2=0231 


2.16 (i) P(no items are sold) = f(0) = į = 0.5. 
(ii) P(more than 3 items are sold) = °° ,(5)"*" = GF x E: e 
0.0625. 
(iii) Plan odd number of items are sold) = (4)? + (4) + (5) + 
G 1)? x d = = ~ 0.333. 


2.18 (i) Since a cae "dx = —cxe "e — e “| = 1 for all c > 0, the 
given function is a p.d.f. for all c > 0. 
(ii) From part (i), 
c(t+1) 


P(X > t) =-cxe |" —e "1 = c(te + e) = oa 


Gii) Here c(t + 1) = 0.2 x 11 = 2.2, ct = 0.2 x 10 = 2, so that GD = 


22 ~ 0.297. 
2.20 We have: 
P(X > a) = fn — xy" da = — f, dd - 2)" 


= -—(1-x)" = (1—%)", and it is given that this probability 
is 1/10”. Thus, 
(1 — xo)” = qn, or 1 — xo = y), and xo = 0.99. 
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L. Section 2.3 








; _ P(AN(AUB)) P(A) __ PA 
3.2 We have: P(A|AU B)= PAUB) = PAUD = PC) +P (since AN B = 
: : _ P(BA(AUB)) _ _ PB) 
Ø), and likewise, P(B|AU B) = PUB) = PLP: 


3.4 (i) P(b2|b1) = 15/26 ~ 0.577; (ii) P(g2191) = 13/24 ~ 0.542; 
(iii) P(b2) = 0.52; (iv) P(b1 N g2) = 0.22. 


3.6 Parts (i) and (ii) follow without any calculations by using the fact that 
P(-|B) and P(-|C) are probability functions, or directly as follows: 


: P(A‘ B) _ P(B~ANB) _ P(B)~P(ANB) _ P(ANB) 
© P(A‘|B) = PD ~ PB) PB) =1 PB) 


=1- P(A|B). 
(ii) P(AUB|C) = (ooo = ee 








_ P(ANC)+P(BNC)— P(ANBNC) _ P(ANC) , P(BNC) — P(ANB)NC) 
= PC) = po) + PO PC) 


= P(A|C)+ P(B|C) — P(AN BIC). 
(iii) In the sample space S = (AHH, HAT, HTH, THH, HTT, THT,TTH, 
TTT } with all outcomes being equally likely, define the events: 








A= “the # of H’s is <2” = (TTT, TTH, THT, HTT, THH, 
HTH, HHT}, 
B= “the # of H’s is >1” = (HHT, HTH, THH, HHH}. 


Then B° = {AHTT,THT,TTH, TTT}, AN Bo = B®, AN B= {HHT, 
HTH, THH }, so that: 
ey _ PANB") _ PCB) _ _ = PNB) _ 
a i as 
4/8 T 47 4 > : 

(iv) Inthe sample space S = {1, 2, 3, 4, 5} with all outcomes being equally 
likely, consider the events A = {1, 2}, B = {8, 4}, and C = {2, 3}, so 
that AN B= Ø and AU B = {1, 2, 3, Ni Then: 








2 
P(C|AU B) = "RP = Y = į = 5, whereas 
P(ANC) _ 1/5 _ P(BNC) _ 1/5 _ 1 
PCIA) = ER = B= 1, (CIB) = = o? = 3g = 3,50 that 








P(C|AU B) £ P(C|A) + P(C|B). 


3.8 For n= 2, the theorem is true since P(A2| A1) = Sc yields P(A; N As) 
= P(Ap:|A;)P(A1). Next, assume P(A¡N- - -NAx) = PLA JAN: DAL) +++ 
P(A>|ADP(A) and show that  P(A¡N --- NAx+1)= PCAx+1141 
A+++ Ap) P(Ax| A1 N---«MNAz-1):-- P(A2|A1)P(A1). Indeed, P(A14N---N 
Ar+1) = PAIN >> NAN ARH) = PLA 411410 - NAL) PCAN: - -NAz) (by 
applying the theorem for two events A¡N---N Az and Az+1) = P (Ak+1| A1 
N- N Ap) P (Arl A N+ + AO A1): ++ P(Ag|A1)P(A)) (by the induction hy- 
pothesis). 


w 








3.10 With obvious notation, we have: P(1st white and 4th white) = P(W, N 
W2 N W3N Wa) + P(W1 N W2N B3N Wa) + P(W1 N B2N W3N Wa) + POW BiN 
B2 N W4) = P(W4|W1 N WN W3)P(W3| W1 N W2) P(W2| W1) P(W1) + P(Wa! 
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Wi N W2 N Bs) P(B3|W, A W2)P(W2|W1)P(W1)+P(W4|W1N.B2N W3)P(Ws| 


WNB3P Cet (Wa) a (Wa [Wa ca (hea)? (Ba | Wari BUE A | Wil x 
PW) =5* BX a* BTE EX uXptp* eX wtp * px 








a x v= meas (7X 8x 9x 1045 x8x9x 10x 24+4x5x 9x 10) 
— _9x10x156_ _ 3 ~ 9 499 
12x13x14x15 — 7 an 


3.12 (i) P(+) = 0.01188; (ii) POIH = Wy = 0.16. 


3.14 Let I = “switch I is open,” II = “switch II is open,” S = “signal goes 
through.” Then: (i) PGS) = 0.48; (ii) PAIS) = 5 = 0.385; (111) P(H| S°) = 


10 ~ 0.769. 


3.16 With F = “an individual is female,” M = “an individual is male,” C = “an 
individual is color-blind,” we have: 
P(F)=0.52, P(M) = 0.48, P(C|F) = 0.25, P(C|M)=0.05, and therefore 
P(C) = 0.154, P(M|C) = E = 0.156. 


3.18 With obvious notation, we have: 


(i) P(D) = 0.029; (ii) PCD) = É = 0.414; (iii) PCD) = $ = 0.310, 
and P(III|D) = $ ~ 0.276. 
320 © PA >H= f rede = fo de™ = epe = e™, 


i P(X AS P(X 
GD P(X > s +t|X > s) = 4 ates ds Sete 





esto) 


=^ (by part (i) 
=o, 

(iii) The conditional probability that X is greater than ¢ units beyond s, 
given that it has been greater than s, does not depend on s and is the 
same as the (unconditional) probability that X is greater than £. That 
is, this distribution has some sort of “memoryless” property. 
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4.2 Here P(A) = P(AN A) = P(A)P(A) = [P(4)}, and this happens if 
P(A) = 0, whereas, if P(A) # 0, it happens only if P(A) = 1. 


4.4 Since P(A; N A2) = P(A;)P(Az2), we have to show that: 


P(A; N (B1 U B2)) = P(A) PCB, U Ba), P(A2 N (B1 U Ba)) 
= P(A2)P(B; U Ba), P(A; N Ag N (B, U Ba)) 
= P(Aı)P(42)P(B1 U Bə). 


Indeed, P(A: N (B1 U B2)) = P((4: N By) U (AN B2)) 
= P(A1 N Bi) + P(A1 N Ba) = P(ANP(B) + PMP E2) 
= P(A,)P(B, U Ba), and similarly for P(A3 N (Bı U Bə)). 
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Finally, 


P(A, N Ag M (By U Ba) = P((Ay N Ap NB, U (Ay N Ag N Ba) 
= P(A, N A2N By) + P(A N A2N B2) 
= P(AYP(AJYP(B) + P(A1)P(A2)P(B2) 
= P(A;)P(A2)P(Bi U Ba). 


4.6 (i) Clearly, A= (AN BNC)U(AN B ACIU(ANBNCI)U(ANBNCS) 
and hence P(A) = 0.6875. Likewise, P(B) = 0.4375, P(C) = 0.5625. 
(ii) A, B, and C are not independent. 
Gii) P(AN B) = 4, and then P(A|B) =$ ~ 0.571. 
(iv) A and B are not independent. 
48 (i) S = {HHH, HHT, ATH, THH, ATT, TAT,TTH, TTT}, A = (HHH, 
TTT} with P(A) = ° +? (q=1- p). 
(ii) P(A) = 0.28. 
4.10 (i) c= 1/25. 
Gi) See figure. 




















Gii) P(A) = P(X > 5) = 0.50, P(B) = P(52 x 27.5) = 0.375. 
Gv) P(B|A) = 0.75; (v) A and B are not independent. 
412 G) P(WNAOR N- -OR ¿NR )U(W OWN RIN: -ORE 3N Rnr-2) 
U---U(W¿N---AW¿_¿NW,-1NR,)) = 0.54 yO oa 
(ii) For n = 5, the probability in part (i) is 0.0459. 
4.14 (i) P(no circuit is closed) = (1 — p,)---(1 — Pn). 
(ii) P(at least 1 circuit is closed) = 1 — (1 — pı) --- (1 — Pn). 
(iii) P(exactly 1 circuit is closed) = p,(1— p2)--- 1 — pp) + — pı) pe x 
(1 — ps) A me) +++ +0 p) A — Pn) Dn. 
(iv) The answers above are: (1 — p)”, 1 — (1 — p}, np — py. 
(v) The numerical values are: 0.01024, 0.98976, 0.0768. 
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5.2 (i) 3 x 4 x 5 = 60; (ii) 1 x 2 x 5 = 10; (iii) 3 x 4 x 1 = 12. 
54 1)3x2x3x2x3= 108; (1) 3x2x2x1x1= 12. 


5.6 27; 25 = 32, 210 = 1,024, 215 = 32,768, 22 = 1,048,576, 
225 — 33,554,432. 


5.8 The required probability is: 34, = 0.003. 





5.10 Start with (”*1)/(”), expand in terms of factorial, do the cancellations, 
and you end up with (n+ 1)/(m-+ 1). 


5.12 Selecting r out of m+ n in (""'") ways is equivalent to selecting x out 
of min ("") ways and r — x out of nin (,” ) ways where x = 0,1,...,7. 
Then ("$") = Lazo (7) (.",)- 

5.14 The required number is (+), which for n = 10 becomes (Y) = 120. 

5.16 The required probability is: 

(3) x (3) x C) x (i) _ 2,480,625 


5 66,661,386 





= 0.037. 


n-1 
5.18 The required probability is: E =l= 
5.20 The required probability is: (0.5) X” o H, which for n = 5, becomes: 
252 x (0.5)! ~ 0.246. 





5.22 The required probability is: 25 (1°)(0.2)"(0.8)"°-* = 0.03279. 
5.24 (a) © (Ey; G) 1- (E; Gi Terres. 





a CR). a: E), aa COCCI 
(b) G) O Gi) 1 (ty)? (iii) © . 





L| Section 3.1 


1.2 (i) EX =0, EX? =c?, and Var(X) = e?. 
Gi) PAX — EX| < c) = P(—c < X < ce) = P(X = =c, X = c) = 1 = 
e _ Var(X) 


eT e 


1.4 If Y is the net loss to the company, then EY = $600, and if P is the 
premium to be charged, then P = $700. 
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1.6 Var(X) = EX? — (EX Y. by expanding and taking expectations. Also, 
E[X(X-D)] = Var(X)+ (EX) — EX by expanding, taking expectations, 
and using the first result. That Var(X) = E[X(X — 1)] + EX — (EX) 
follows from the first two results. 

18 G) EX = 2, E[X(X — 1)] = 4; Gi) Var(X) = 2. 

1.10 EX = $, EX? = 2, so that Var(X) = 5 and s.d. of X = {2 = 0.471. 

1.12 cı = —1/12, c2 = 5/3. 

1.14 (i) by adding and subtracting u, we get: E(X — c)? = Var(X) + (u - ©’; 
Gi) Immediate from part (i). 

1.16 (1) Pe a = arctan x|% = arctan(oo) — arctan(—oo) = 7, so that 

en 1 2% ie =1. 

(i) arta = oe So EP = a logd +27) X= Co- 00). 

1.18 For the discrete case, X > c means x; > c for all values x; of X. 
Then x; fx(%i) > cfx(x%;) and hence ) >, wi fx) > » y, efx). But 
Y y, Hifx@i) = EX and >), cfx(vi) = 0) y, fx@i) = c. Thus, EX > c. 
The particular case follows, of course, by taking c = 0. In the continuous 
case, Summation signs are replaced by integrals. 





L. Section 3.2 
2.2 (i) c=0/V1—a; (ii) c = 5 = 4.464. 


2.4 (i) By the Tchebichev inequality, P(X — u| > c) = 0 for all c > 0. 

(ii) Consider a sequence 0 < Cn | 0 as n > oo. Then P(|X — u| > Cr) = 0 
for all n, or equivalently, P(X — u| < cn) = 1 for all n, whereas, 
clearly, {(|X — u| < cry) is a nonincreasing sequence of events and 
its limits is N;2,(1X — u| < Cn). Then, by Theorem 2 in Chapter 2, 
1 = lim»>ow PAX — u| < cn) = P(N, UX — u| < cn)). However, it 
is clear that N% (X — u| < cn) = (X — u| < 0) = (X = n). Thus, 
P(X = u) = 1, as was to be seen. 


L. Section 3.3 
3.2 (i) It follows by using the identity ("+*) = () + (,.",)- 


x E x—l 
(ii) B(26, 0.25; 10) = 0.050725. 
3.4 If X is the number of those favoring the proposal, then X ~ B(15, 0.4375). 
Therefore: (i) P(X > 5) = 0.859; (ii) P(X > 8) = 0.3106. 


3.6 If X is the number of times the bull's eye is hit, then X ~ B(100, p). 
Therefore: 
() PX > 40) = Dg Ca) P (q=1- p) 


GD PX > 40) = Zra (1) (0.25)"(0.75)1™. 
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(iii) EX = np = 100p, Var(X) = npq = 100pq, and for p = 0.25, 
EX = 25, Var(X) = 18.75, s.d. of X = /18.75 ~ 4.33. 


3.8 From the Tchebichev inequality n = 8,000. 
3.10 (i) Writing (”) in terms of factorials, and after cancellations, we get: 
EX =p 0 5 (y ) pig D-¥ = np x 1 = np. Likewise, 
E[X(X— D] = Maal Dp Erol p= =n(n- DP? x 1 


= n(n — Dp?. 
(ii) From Exercise 1.6, Var(X) = n(n — Dp? + np — (np)? = npq. 


3.12 (i) P(X < 10) = 1 — q"; (ii) 1 — (0.8) ~ 0.893. 








3.14 If X is the number of tosses to the first success, then X has the Geometric 
distribution with p= 1/6. Then: 
(i) PX =3)= = ~ 0.116; 
(ii) P(X>5)= EN ~ 0.482. 

3.16 (i) EX = 7 E[X(X — D] = E (ii) Var(X) = ne 

3.18 à = 2. 


E EE a à AS A 
3.20 fat D= Bay = a X eC R= Sm). 


3.22 (i) My(t) = -D t e R, by applying the definition of Mx. 
(i) EX = 4My(O|-0 = à, EX? = LMx(Olizo = AG + 1), so that 











Var(X) =A 
s21 o ELE < 008 0D EC) + CCH) + CY) = 0987. 


3.26 Writing the combinations in terms of factorials, and recombining the 
terms, we get the result. 


3.28 (i) By integration, using the definition of T (œ) and the recursive relation 
for T(a + 1), we get EX = Eo ra + 1) = af. Likewise, EX? = 
fore +2) = a(a + 1)6?, so that Var(X) = af”. 

(ii) EX = 1/2, Var(X) = 1/1? from part (i). 
(iii) EX = r, Var(X) = 2r from part (i). 

3.30 (i) (a) With g(X) = cX, we have Eg(X) =cC/A. 

(b) With g(X) = c(1 — 0.5e7%X), we have Eg(X) = CEBO. 
(ii) (a) 10; (b) 1.5. 

3.32 Indeed, P(T > t)=P (0 events occurred in the time interval (0, £)) = 
cua =e So, 1—Fr(t) = e™, t > 0,andhence fr(t) =1e7*, t > 0, 
and Ti is as described. 

3.34 (i) N apate da = — fr de = gor? | = 1. 

Gi) £ = 1 and any a > 0. 
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(iii) Forn = 1, 2, ..., EX” =T (¿+ D)/a"*, Then EX = T(} + 1)/a®, 
EX?=r( +b/o2/8, a a De ra + D3 
æl. 


3.36 All parts (i)-(iv) are immediate. 
3.38 (i) P(X < c) = 0.875, and c = u + 1.150; (ii) c = 7.30. 
3.40 (i) 0.997020; (ii) 0.10565; (iii) 0.532807. 


3.42 Let X be the diameter of a ball bearing, and let p be the probability 
that a ball bearing is defective. Then p = P(X > 0.5+ 0.0006 or X < 
0.5 — 0.0006), and by splitting the probabilities and normalizing, we have 
that it is 0.583921. 


3.44 (i) By the hint, 


_ Bs 1 271 00 2/2 
dxdy = zz i do i re dr 


hws mp J= 











ay? 
GD) e dos gig fe Pody=1=1, 


346 (i) Mx) =e??? x L [2 eat ae? x1 ae, ted. 


Tin J—co 
Gi) With Z = 224, e = Mz) = Mixa (0) = e Mx; i, 
so that My (4) = eats , and Mx(t) = e1'+% by replacing £ by 
ten. 
(iii) By differentiation of Mx(t) and evaluating at 0, we get: EX = u, EX? 
= pw? + a”, so that Var(X) = o°. 
348 (1) EX? = 0, and by the hint EX” = (2n— Dn— 3)-- 


1x2x--x Cn- 1)x (2n) (2n)! 
(2 x 1)x---x [2x(n—1)]x(2 x nm) 27 [1--(n-Dn] — = o" 


GD EX = 0, EX? = 1, so that Var(X) = 1. 
Gii) With Z = *=#,0 = EZ = LEX — m), so that EX = y, and 1 = 
Var(Z) = 4Var(X), so that Var(X) = o°. 





3.50 (i) P(-1<X <2)=23=0.7 anda = 2. 
(ii) P(X] < 1) = P(|X| > 2) is equivalent to 4 = 1 — 2 from which 
a=3. 


3.52 EX =}, so that: (i) —0.5 and (ii) 2(e — 1) = 3.44. 


4.2 (i) £p = [M+ Dp]/6+D; Gi) For p = 0.5 and n = 3: ros = 21/4 ~ 1.189. 
4.4 (i) Ci = C2 = 1; (ii) 1/3 = 0. 
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4.6 (i) 





l213 141516171819] 10[111 112 
F(a) 1/36 2/3613/36 4/3615/36 6/36/5/36 4/36/3/36 2/36 | 1/36 


















































(ii) EX = 7; (iii) median = mode = mean = 7. 
4.8 Mode = 25 and f(25) = (12) (14 )”; one would bet on X = 25. 


4.10 By the hint, P(X < c) = E Fo) dx = de F(c— y)dy, and P(X > c) = 
S FŒ dx= fy f(e + y)dy. Since f(c— y) = f(c + y), it follows that 
P(X < c) = P(X > c), and hence c is the median. 

412 © p = PY < W) = Pla(X) < wl] = PIX < g”(yp)l, so that 

9 (Yp) = Lp and Yp = g(xp). 
(ii) xp = —log(1 — p). 
(iii) yp =1/0 — p). 
(iv) xos = —log(0.5) ~ 0.693, and yo5 = 2. 
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1.2 P(X =0,Y =1)= P(X =0,Y =2)= P(X =1,Y=2)=0, 
P(X =0, Y = 0) = 0.3, P(X = 1, Y = 0) = 0.2, P(X = 1, Y = 1) = 0.2, 
P(X = 2, Y = 0) = 0.075, P(X = 2, Y = 1) = 0.15, P(X = 2, Y = 2) = 
0.075. 

14 ON Aæ? + Daxdy=?x 1 = 1; P(X > Y) = Ë ~ 0.268. 

1.6 (i) PA <x) =1-e*, x > 0; G) PY <y) = 1- e, y> 0; 
(iii) P(X < Y) = 0.5; (iv) P(X +Y < 3) = 1 — 4e73 ~ 0.801. 


1.8 c=1/Y2r. 
1.10 c = 6/7. 





Section 4.2 
2.2 fx(0) = 0.3, fx) = 0.4, fx(2) = 0.3; 
Sv) = 0.575, FA) = 0.35, fy(2) = 0.075. 
2.4 (i) fx) = 7/36, fx(2) = 17/36, fx) = 12/36; 
Jr) = 7/36, fr@) = 14/36, fy(8) = 15/36. 
Gi) f£xrQlD = 2/7, £xrQlD = 2/7, fxr GIL) = 3/7, 
SfxirQ12) = 1/14, fxirQ12) = 10/14, fr (812) = 3/14; 
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Sxir (13) = 4/15, fxirQ13) = 5/15, fxv l3) = 6/15; 
SyixQ|D) = 2/7, fx) = 2/7, fyx@lD = 3/7; 
SyixQ|2) = 2/17, fyx@l2) = 10/17, fyx@l2) = 5/17, 
SyixQ|8) = 3/12, frix(218) = 3/12, frx@|3) = 6/12. 


26 © KA= tL.. n A= EYE l.. 








n(n+1) ? 
GD farl) = y HL. n frula) = i, y= 1, 2 
YH 1k i ea Li Me 
(ii) EQAY = y = 28% y=1,...,1 





EYIX=0=*%HY)x=1,...,n 


2.8 fx(v) = $a+2,0<a<1; fry = $y? +2,0<yK<1. 


2.10 ©) Sx) = xe”, x > 0; fry) =e, y> 0. 
Gi) frix(ylx) = e”, x > 0, y > 0; 
(iii) P(X > log4) = Ht ~ 0.597. 


2.12 @) fx@) = Exr+ 1,0 <x < l; f= %+2,0<y<2; 
frx = 2 0<x<1,0<y<2. 


Gi) EY = $; EY |X =a) = $ x Z 0 <x <1. 


Gii) It follows by a direct integration. 


(iv) PŒ > $|X < į) = 2 ~ 0.739. 














2 
2.14 fxur(xly) = iye ze 2%, 0O<y<x. 


2.16 (i) 
62/7, O<x<l 
Fx(a)= $612 —x)/7, l<x<2 
0, elsewhere. 


Gi) fr(ulx) is 1 for 0 < x < 1, and is 1/(2 — x) for 1 < x < 2 (and 0 
otherwise), whereas 1 < x+ y < 2. 
2.18 (i) fxyCly) is the Poisson p.d.f. with parameter y. 
@) fora, y) =e "4,2 =0,1,,..5 


(iii) fx(v) = sn, v= 0,1... 


2.20 (i), (ii) follow by applying the definitions. 


3.2 It follows by an application of the definition and properties of a m.g.f. 
3.4 Apply the exercise cited in the hint with Z = X — Y and Z = X + Y. 
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3.6 (i) EX = 1, EY = 0.5, EX? = 1.6, EY? = 0.65, so that Var(X) = 0.6 
and Var(Y ) = 0.4. 


(ii) E(XY)=0.8, so that Cov(X, Y)=0.3 and p(X, Y)=1.25/0.24 ~ 
0.613. 


(iii) The r.v.'s X and Y are positively correlated. 


3.8 (1) EX=%, EY =%, EX? = E, EY? = E, so that Var(X) = 659/36? 
and Var(Y) = 728/367. 


(ii) BUY) o so that Cov(X, Y)= — z% and p(X, Y)=-— 45 


3.10 EX =0, Var (X)=10/4, EY =5/2, EY? = 34/4, Var(Y)=9/4, E(XY)=0, 
so that Cov(X, Y) = 0 and p(X, Y) = 0. 

3.12 (i) EX = EY =7/12; 
(ii) EX? = EY? = 5/12, so that Var(X) = Var(Y) = 11/144. 
Gii) E(XY) = 3, so that Cov(X, Y) = — jh and p(X, Y) = — 
(iv) X and Y are negatively correlated. 

3.14 With Var(X) = 0°, we get Cov(X, Y) = ao? and p(X, Y) = $. 
Thus, |p(X, Y)| = 1, and p(X, Y)= lifand onlyifa > 0, and p(X, Y)=-—1 
if and only if a < 0. 


i 
11* 


3.16 By differentiation, with respect to a and £, of the function g(a, 8) = 
E[Y — (aX + B)1?, and by equating the derivatives to 0, we find: @ = 
or p(X, Y), B = EY —GEX. The 2 x 2 matrix M of the second-order 


derivatives is given by: M = 4( ), which is positive definite. Then 


â and $ are minimizing values. 


L| Section 4.4 


4.2 Mx, xo,x3 (41, t2, t3) = ee — t )(c — t2)(c — t2), provided t, tz, t3 are <c. 


4.4 Follows by applying properties of expectations. 


L Section 4.5 


5.2 If X1, X2, and X3 are the numbers of customers buying brand A, brand B, 
or just browsing, then X1, X2, X3 have the Multinomial distribution with 
parameters n = 10, pı = 0.25, pz = 0.40, and p3 = 0.35. Therefore: 

© P(X, = 2, X = 3, X3 = 5) = (0.25) x (0.40)? x (0.35) ~ 0.053. 


(ii) P(X = 1, X = 3/X3 = 6) = f(g)! ($)? = 0.358. 


5.4 They follow by taking the appropriate derivatives and evaluating them 
at 0. 








5.6 The second line in (51) follows from the first line by adding and subtract- 
ing the quantity pany. The expression in the following line follows 
by the fact that the first three terms on the previous line form a perfect 
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square. What follows is obvious and results from a suitable regrouping of 
the entities involved. 


5.8 In Exercise 5.7, it was found that E(XY ) = uıu2+p0102, where y; = EX, 
u2 = EY, o = s.d. of X and o2 = s.d. of Y. Then 


Cov(X, Y)  (uiMa+ p0102) — pipa _ P 
0102 0102 


p(X, Y) = 





5.10 They follow by differentiating the m.g.f. and evaluating the derivatives 
at 0. 


5.12 (i) It is not a Bivariate Normal p.d.f., because, for x and y outside the 
interval [—1, 1], the given p.d.f. becomes f(x, y) = x exp[—(a? + 
y?)/2], which is the Bivariate Normal with y; = u2 = 0, 0, = 02 = 1, 
and p = 0, whereas f(—1, —1) = + 4 z}, the value of the Bivariate 








Normal just mentioned evaluated at x = y = —1. 
(ii) fo(y) = Fee 2 which is the p.d.f. of the N(0, 1) distribution. Sim- 


ilarly, f(x) = A 


L. Section 5.1 


1.2 The relation fy,y(x, y) = fx(x)fr(y) holds true for all values of x and y, 
and therefore X and Y are independent. 


1.4 Ther.v.'s X and Y are not independent, since, e.g., fx, y(0.1, 0.1) = 0.132 4 
0.31824 = 0.52 x 0.612 = fx(0.1) fy(0.1). 


16 © fx(0=%(?+3),0<x<1 fM=fy4+5),0<yK<1. 
(ii) The r.v.’s are not independent, since, e.g., fx,y(5, į) = 2 
Sx) fv). 
18 @ fx(0)=2x,0<x<l; fr(y)=2y, 0 < y < l; fe) = 22,0<2<1. 
(ii) The r.v.’s are independent because, clearly, 


Sx yz Y, 2) = fx fr fz), 
(iii) P(X < Y < Z) = 1/6. 


9 To 
to * 10 = 


1.10 (i) c can be any positive constant. 
(ii) fxy(a, y) = e, x > 0, y > 0, and likewise for fx z and fy z. 
Gii) fx) = ce, x > 0, and likewise for fy and fz. 
(iv) The r.v.’s X and Y are independent, and likewise for the r.v.’s X, Z 
and Y, Z. Finally, from part (iii), it follows that the r.v.'s X, Y, and Z 
are also independent. 


L. Section 5.2 
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1.12 (i) EX = 200 days; (ii) Mysy(t) = 1/1 — 2000, t< 0.005, and 
fx+y Œ) = (0.005)?te7-2005t ¢ > 0, 
Gii) P(X + Y > 500) = 2.5e-25 + e 25 ~ 0.287. 
1.14 (i) My(t) = exp[(auı + b)t + opte] which is the m.g.f. of the N(au; + 
b, (ac) distribution. Likewise for V. 
act? corto 
Gi) My, v(t, t2) = explíaja + bh + LLL + (ep2 + dtz + 2d], 
Gii) Follows from parts (i) and (ii), since My(4)Mvy(t2) = My v(t1, t2) 
for all t, t2. 
1.16 Mx(0) = [M(5)I", 
1.18 (1) EX = p and Var(X) = pq /n; (ii) n= 10,000. 
1.20 (1) JxED= 2a + B, fx) = 28, fx(1) = 2a + B; 
fv(-1) = 20 + B, fr(0) = 28, fr(1) = 20 + B. 
(ii) EX = EY = 0, and E(XY) = 0; (iii) Cov(X, Y) = 0. 
(iv) The r.v.'s are not independent, since, e.g., f(0, 0) = 0 4 (26) x (28) = 
Sx) fr (0). 
1.22 (i) EX = wand Var(X) = o?/n. 
(ii) For k = 1, n= 100; for k = 2, n= 25; and for k = 3, n= 12. 


1.24 (i) EX = wand Var(X) = 0*/n. 
(ii) The smallest n which is > 1/(1 — @)c?. 
(iii) Forc = 0.1, the required nis its smallest value > 100/(1—@). Fora = 
0.90, n= 1,000; for a = 0.95, n= 2,000; for æ = 0.99, n= 10,000. 


2.2 X +Y ~ B(30, 1/6) and P(X +Y < 10) = MANDES 

24 (i) S, ~ Bin, p); 
(ii) EX; = p, Var(X;) = pq (q =1- p). 
(111) ES, = np, Var(S,) = npq. 

2.6 If X be the r.v. denoting the breakdown voltage, then X ~ N(40, 1.57), 
and therefore: 
(i) P(89 < X < 42) = 0.656812; (ii) = 0.382. 

28 (i) Xit- t Xn ~ Pit- + Àn). 
Gi) BEX = (M1 + -++ + àn)/n, Var(X) = (Ay +- + àn) / è. 
(iii) EX = å, Var(X)=2/n. 

2.10 © PÆ SsaT=0)=0 GHA- 02, so the XiT =t ~ BE, $), 
and likewise for the other r.v.'s. 

(ii) Here à = nc, and therefore X; |T = t ~ BG, 1 f 
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2.12 (i) PX > Y) = ¡-o(- +3) (ii) PX > Y) = 0.5. 
ee 


2.14 (i) Forr>0, P(R < r) = P(U < 1r?/0?) U ~ xi. 
Gi) For o = 1 and the given values of r, the respective probabilities are: 
0.75, 0.90, 0.95, 0.975, 0.99, and 0.995. 
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dl 


L. Section 6.1 


12 @ X~ NCAR, 20%) (ii) a ~ 32.222, b = 35. 


5u—160 5o = — Bu ie Sa 
E +k 








(iii) a = 
1.4 fr = ay, y > 1; fD = re? ze R. 


16 (1) fr(y) = 5e-¥”, y > 0, which is the p.d.f. of a x. 
GD DO, Yi ~ x3, since Y; ~ x3, i =1,..., nindependent. 


1.8 Sv) = ari”, y> 0. 
1.10 The results follow by implementing the suggestions in the hint. 


1.12 The results follow by implementing the suggestions in the hint. 


L_. Section 6.2 
2.2 (i) fuv@,v)= apes u> 0, v >0. 
Gi) fu@ = ue, u > 0; fy) = 1/0 +vY, v > 0. 
(iii) U and V are independent. 
24 (Ò fuvyv)= 2 EA (u, ve T. 


[u— (anı +09 [v — (cua +a)? 
(ii) Ju, y(u, v) = maze exp{ ACEA }x a exp{ 2(co2? b 


and therefore U and V are independently distributed as N(auı + b, 
(ao,)*) and N(cua + d, (co2)?), respectively. 
2.6 (i) fuv(u v)= ae x TOR u, ves. 
(ii) U ~ N(O, 1), V ~ N(0, 1). 
Gii) U and V are independent. 
(iv) By parts (ii) and (iii), X + Y ~ N(0, 2) and X — Y ~ N(0, 2). 


28 fo(u)=1,0<u<1. 








Section 6.3 


L. Section 6.5 
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3.2 It follows by forming the inner products of the row vectors. 


M1 — Y~-bh2 
01 





3.4 It follows from the joint p.d.f. fx y, the transformation u = 
and the fact that the Jacobian J = 0,09. 


3.6 (i) It follows from the joint p.d.f. fx y, the transformation u = x + y, v = 
x — y, and the fact that the Jacobian J = —1/2. 
(ii) U and V are independent by the fact that they have the Bivariate 
Normal distribution and their correlation coefficient is 0. 
(iii) It follows from part (i) as marginals of the Bivariate Normal. 
3.8 © Pay < X < bu, 0 < S? < ca?) = [®(K(b-1)./n)— O(kK(G—- 1),/n)] x 
POG < c(n — 1)); (11) The probability is 0.89757. 





5.2 EY, = zh, EY, = 5%, and EY, > lasn— 00. 


5.4 E(Yi Yn) = en Therefore, by Exercise 5.2, Cov(Y,, Yn) = 


5.6 fz(2) = re, z > 0. 
5.8 (1) IYn) = me (1 — ey"), ya > 0. 
(ii) For n= 2, EY, = 3/24, and for n= 3, EY; = 11/61. 


5.10 gin(Y1, Yn) = NN = DIF Yn) — FYD SUDI Yn), A< Yı < Yn <b. 


1 


L. Section 7.1 


1.2 For every e > 0, P(Xn| > e) = P(Xn = 1) = Pn, and therefore X, 50 if 
and only if p, > 0 as n —> oo. 
1.4 @) PCY nl > £) = (1 — £)” > 0, as n —> 00. 
Gi) PU Yan — 1 > e) = 1-— P(Ynn — 1| < e) and PUYnn — 1] < e) = 
1-(1-—e)" > 1, so that P(|Yon— 1| > £) > 0, asn > œ. 
1.6 EX, = u and E(X, — 1? = Var(X,) = = > 0,as n> 00. 


18 Ep — XÈ = E(Y,—X,y¥ + E(Xn — X + 2E[(Yn — Xn)(Xn — X)] > 0, as 
n — oo, by the assumptions made and the fact that |E [Yn— Xn) (Xn- X] < 
EY Xn — Yal? x E!?|Xn — X}. 
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L| Section 7.2 


22 (i) Mx) =(1-W/(1-—ae!), t < —loga; 
Gi) EX =a/(1— a). 
Gii) Mz © = (5) = = {1 at/(1—w) + [a/(1—a)nR( 4 Dy- f > patio a), 


1-aet/n NAO y 


since ?R(£) 2, 0 for fixed t, and parta) j is the m.g.f. of ¡2 





2.4 Since X ~ B(1,000, p), we have; 
(i) P(1,000p — 50 < X < 1,000p +50) = Dro tboon- so (70) prq 000—x, 
q = 1 — p. For p = į and p=}: 


550 
1,000 
P(450 < X < 550 = Y ( i Jose, 
a=450 \ Y 
300 
1,000 
P (200 < X < 300) = y ( 0 joz xO), 
ao \ Y 
(ii) For p = 3 and p = }, the approximate probabilities are 20 (3.16) — 
1 = 0.998422 and 29 (3.65) — 1 = 0.999738. 
2.6 EX; = 1, EX? = Y, so that Var(X;) = $. Therefore P(150 < X < 
200) ~ 29(2.07) — 1 = 0.961548. 


2.8 Since X ~ B(1,000, 0.03), the required approximate probability is: 
P(X < 50) ~ ®(8.71) = 0.999896. 


x = 0.02/n\ — _ 
2.10 P(X — 0.53) < 0.02) = 20 (52%) — 1 = 0.99, so that n = 4,146. 


2.12 With S, = > ';_, X;, we have: 
(i) P(S, < An) = P(n) — 0.50. 
(ii) P(S, = An) ~ O(VAn) — O(1/V An). 
(ii) PË < Sn < 2n) ~ x &(/An/2) — &(/An/4). 
(iv) P(S, < 100) ~ 0.50, P(S, > 100) ~ 0.460172, P(50 < S, < 75) ~ 
0.00621. 


2.14 The total life time is X = ES Xi, where X;'s are independently distri- 
buted as Negative Exponential with à = 1/1,500. Then P(X > 80,000) = 
1 — 9(0.47) = 0.319178. 


2.16 (i) P(a<X<b)= O((2b — 1)V3n) — O((2a — 1)V3n). 
(ii) Here (2b — 1)/3n = 0.75, La — 1)/3n = —0.75, and the above 
probability is: 2 (0.75) — 1 = 0.546746. 
2.18 P(IX — u| < 0.0001) ~ 20(0.2,/n) — 1 = 0.99, and then n= 167. 
a > E 
2.20 (i) P(X,— u| < ko) = 20(k/n) — 1 = p, so that n= [0 (32)?. 
(ii) Here nis the smallest integer >1/(1 — p)k?. 


(iii) For p = 0.90, p = 0.95, and p = 0.99, and the respective values 
of k, we determine the values of n by means of the CLT and the 
Tchebichev inequality. 
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Then, for the various values of k, the respective values of n are given in 
the following table for part (i). 


k\p 0.90 0.95 0.99 


0.50 11 16 27 
0.25 44 62 107 
0.10 271 385 664 


For the Tchebichev inequality, the values of n are given by the entries of 
the table below. 


k\p 0.90 0.95 0.99 


0.50 40 80 400 
0.25 160 320 15600 
0.10 1,000 2,000 10,000 





2.22 (i) PAX —Y| < 0.250) = P(|Z| < 0.250) = 205) — 1 = 0.95 and 
then n = a 


Gi) From 1 — > 0.95, we find n = 640. 


nz = 
2.24 (i) P(|X —20| < 2)~ DC- De) and taking me = nz, 
we find n= 50. 


(ii) For n= 50, the approximate probability in part (i) is 0.438086. 
(iii) As in the hint. 


| Chapter 9 


L. Section 9.1 














1.2 The matrix is negative definite, because for à;, 42 with åf +12 4 0, we 


have: 
n/s 0 A 
(4, (4 a aoa) « le -45 = Maa 4 <0. 


14 With Y = Yẹ}, (4) ~ x2, and S = o?Y/(n — 1), we have 


Yara o nD. 
1.6 With L(0 |x) = 6" — ee. X = (t, ..., &n), we have 4 log L(9 | x) = 


0 produces 9 = 1/%, and 4 log L@|x) = -ġ — EY < n 








2 
1.8 The MLE of 8 is 6 = —n/ );_, log x%;. 


1.10 (i) It follows from what it is given; 
(ii) 6 = 2/2. 
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1.12 (i) abe ot two ¡OS are immediate. The third follows from 
a (4 2 ay and similarly for the fourth. The fifth follows from 
of 
1 
i= = riy and (5) = athe: 
(ii) Immediate from part (i). 
(iii) 11 = X and fiz = Y follow by solving the equations: 
024] — 01042 = 02% — 01 py, 02Pp1 — 012 = 020% — 01Y 
following from the first two likelihood equations. 
(iv) They follow from part (ii). 
(v) They follow from part (iv) by solving for oí, 07, and p. 
1.14 (i) Immediate by the fact _ that _dy3(=d31) = dy4(=d4,) = dis(=d;51) = 
d23(=d32) = da (=d42) = d25(=đ52) = 0. 
(ii) Immediate by the fact that d,z = da, and d3s = dz. 
(iii) Immediate by the fact that dz; = di2, das = d3a, ds4 = das, and 
ds3 = das. : : 
(iv) Dı = Ds = ©, D3 = wap) , Di = 5. 
— Le p_ _ op w 
(v) A _ a > E a vi, C= — ae . 
(vi) Ds = - Le. 
(vii) D) = 1 > 0, D, = —% < 0, Da = È > 0, Ds = -ERA < 0, 
Da = 77 > 0, and Ds = -F <0. 
L. Section 9.2 
2.2 (i) R@; 0) =e "%; Gi) The MLE of R(x; 0) is e 7%, 
24) 6=1> 3,8; GÂ = 
2.6 (i) It follows by integration. 
log X;. 


(ii) Iha X: is a sufficient statistic for 9, and so is ;_, 
2.8 X; 1X;] is a sufficient statistic for 0. 


2.10 Here(Xy, ..., X;-)isaset of statistics sufficient for (py, . . . , pr), or (X1, ..., 
v-1) is a set of statistics sufficient for (pı, ..., Pr-1). Furthermore, 
(Xm, Xu) is a set of statistics sufficient for (a, 6). 


2.12 They follow by integration. 


Section 9.3 


32 (i) EX = 4, Es(nY) = 3. 
(ii) Varg(X) = a < $ = Vare(nY). 








3.4 @ gı(y) = = 7 = pe As ys 02, 
Jnly) = 0o— 2 A yt > 01 < YS 0%. 
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Then, with 0 = (6,, 62), 
E — nó A ie _ aA a and 








Es(úpL) = 430%, Emily, — Y)] = Op — 01. 
(ii) The pair of statistics (Y, , Y, ) is sufficient for the pair of the parameters 
(01, 02). 


3.6 They follow by integration and by using part (ii) of Exercise 3.5. 


3.8 (i) The Vars(U¡) and Varg(U2) follow from the Var¿(Y,) and Var¿(Y,) in 
Exercise 3.6, and the Covy(Y,, Yn) from Exercise 3.7(ii). 


(11) Immediate by comparing variances. 


3.10 (i) The aid condition is );_, & = 1. 
Gi) c= -,i=1,. 


3.12 (1) re 0 and Varo (X) = > —— = 0" 


(ii) E,8S?=0 and Vaty(S2) =~ z= ae 





3.14 It follows by the hint and Exercise 3.13. 


3.16 (i) See application to Theorem 3. 
(ii) Esh(X) = Ois equivalent to )~"_) h(a)(")t® = 0(t = 2) from which 
it follows h(x) = 0, x =0,1,...,n”. 
(iii) It follows by the fact that X = Z and T ~ B(n, 0) is sufficient and 
complete. 


3.18 (i) X is sufficient by the Factorization Theorem, and also complete, 
because Esh(X) = It Ny Rat = 0 (t = 1 — 0) implies h(x) = 


VpS 2 

Gi) U is unbiased, because EU = 1 x PU = 1) = RU = 1) = 
P(X =1)=0. 

Gii) U is UMVU, because it is unbiased and depends only on the sufficient 
statistic X. 


(iv) Vara(U) =0(1—0) > PU - 0) = 5 0 <90 < 1). 


3.20 (i) Sufficiency of Y,, follows from Exercise 2.9(111); completeness cannot 
be established here. 
(ii) It follows from part (i) and Exercise 3.3(ii). 


(iii) Because the function L(0 | x) (= f (œ; @))is not differentiable at 0 = x. 


L. Section 9.4 





4.2 It follows by the fact that i, i. cpa x*!(1—4)'-!dax = 1 and the recursive 
relation of the Gamma function. 
4.4 MEE from the hint. 


4.6 R(0;d)= = 2 independent of 6, and then Theorem 9 applies. 
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Section 9.5 
52 05=(1-20/(X-D; (ii) = 3.andé ~ 3.116. 


5.4 Follows from Exercise 3.15(i). 


5.6 (i) The EX and EX? follow by integration, and Var(X) = f?. 
(ii) & = X — S, B =S, where S? = 1% ,(X; — XY. 


5.8 By equating the second-order sample moment to o, we get 6 = 
È Di A). 
5.10 (i) It follows by integration; 
Gi) They also follow by integration. - 
Gii) 6 = 3X and hence Eð = 0 and Varg (5) = 0? /2n. 
5.12 (i) and (ii) follow by straightforward calculations. 
(iii) It follows by the expression of Pn(X, Y ) and the WLLN applied to 
nt? GM), X, Y a EA- 2, and nt E; PY. 
5.14 F ~ 161.533, sy = “7.825 ~ 27.954, y= 140.6, and sy = S574 ~ 
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= Section 10.1 
1.2 m= 4n. 


14 © n= Q2.20/0*; Gi) n= 1,537. 
16 © My) = 1/01 — 0t), t < F, which is the m.g.f. of the Gamma distri- 
bution with «œ = n and £ = 98. 
(ii) My(t) = 1/0 — 2t)™?, t < }, which is the m.g.f. of the x3, distribu- 
tion, so that V ~ x3... 
üi) Pot, 2a Y Xil. 
18 (i) For x > 6, F(x;0) = 1 — e 4-9 and hence g(y;0) = ne "9, 
y>0. 
(ii) By setting t = 2n(y— 0), we get fr(t;0) = he7'?, t > 0, which is the 
p.d.f. of the x? distribution. 
(iii) It follows by the usual arguments. 
1.10 (i) The transformation y = |x| which yields x = y for x > 0, x = —yfor 
x < 0, and | = 1. Then, by Theorem 6, g(y; 0) = pe, y> 0. 
(ii) and (iii) are as in Exercise 1.9(ii) and (iii), respectively. 
112 )e*-e*=1-a. 


Chapter 11 507 


(ii) Immediate by part (i) and the fact that T has the p.d.f. e*, t > 0. 
(iii) Follows from the hint. 

1.14 From the transformations r = Yn — yı, and s = yı, We get: Y1 = S, Yr = 
r+s,and|J| = 1. Then frs(7, s;0) = LD 0 <r < 0,0 <s < 6-1, 
and then fp(7;0) = “+ Yr"6 —1),0<r<0. 

1.16 (i) It follows by using the transformation t = 7/0. 


(ii) The required confidence interval is [R, 2). The relation c! [n — 
(n— 1)c] = a follows from: 


1 
l-a=Poc<T<1) =f n(n— D? — t) dt 


=n-—nc"!- (n—- 1)+ (n— De”. 


1.18 Follows by the usual procedure and the fact that a. ~ x2, — ~ x2 


independent, so that ¿72% sE y/ le ~ Fum 


L. Section 10.2 


2.2 The required interval is la% se 0%), where 0 < a < b with Pla < X < 
b)= =1-— a, X~F, —1,m-1- In ‘particular, a= Fn-1,m-11-2> = Fy-1,m-1;%- 


L| Section 10.4 





4.2 (i) The required confidence interval is X,, + E za Se 
Gi) Here X 100 + 0.196100. 
Gii) The lengthis 220 e 


n` 





HA which converges in probability to 0 since S,, 5 O. 
44 G) PQ; < xp) = P(atleast i of Xi, ..., Xn = %) = Mi la, 
and also P(Y; < %p) = P(Y; < %p < Y;) + P(Y; < Xp); so that 
PY, < tp SY) = Dea pra o Pa" Ein Geka". 
(ii) and (iii) follow from part (i) and the Binomial tables. 
(iv) It follows from the hint and the Binomial tables. 


(MN Section 11.1 
L. Section 11.2 


1.2 Ho; and H4;,i = 1, ..., 5 are all composite; Hog and Hyg are both simple. 


2.2 (i) The required n is determined by solving for n (and C) the two equa- 
tions: % = 91 — a) and LCD = 91 — (0). 
Gi) n = 9 (and C = 0.562). 
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2.4 (i) The MPtestrejects Hj) when x > Cn, where Cn = no+ Ta o-1(1—ay). 
(ii) The required C, and n are given by: 


(u1 — Ho) ®1(1 — an) 


Cn = Ho + 5d a) — (1 — mp)’ 








2 
n= | = [oa — An) — o(a = mol} : 
Hı — Lo 


Gii) That a, ,=20 follows from 1 — y US) = a, and the fact that 


n—>00 


Mo < Cy; that 7,21 follows from 1 — pot O = Tn, and the 


fact that Cn < pj. 
(iv) n= 33 and C33 ~ 0.546. 
2.6 (i) C@) = 4, Q(0) = —] strictly increasing, T(x) = x, and h(x) = 
I(0,00)(#). 
(ii) The UMP test rejects Ho when > ;_, xi < C, where C is determined 
by Pa (7-1 Xi < he mas 
Gii) Ta x (0) = ay oF @< +), and Map", xu = = My», x) = = 
which is the m.g.f. of the x¿, distribution. 





1 
a- => Aj = (12077 > 
(iv) C = 8x34 y TO) = PA < 8), X~ xh, 
(v) The closest value we can get by means of the x?-tables is n = 23. 
2.8 (i) The MP test rejects Hy when no > C, C ~ 0.842. 
Gi) 7 = 0.823. 
(iii) By means of geometric considerations, we find z = 0.822. 








Af) 
2 po 
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2.10 (i) The MP test rejects Họ when xe [x > >, 0 integer; 1.36 x x > Ch, 
where C is determined by Pm ({1.36 x = > C}) =a. 


(ii) For C = 2, if follows that a ~ 0.02. 


L. Section 11.3 


3.2 (ii) The UMP test is given by: 
i ify ae eC 
Qh... n= 4" ESC 
O EQ 
where C and y are determined by: 
Pp (X < C)+yPp(X =C)=a, X~ BM, po). 
(iii) Here C ~ 3.139, and Ho is rejected when X < 3. 


3.4 (i) With each student associate a B(1, p) r.v. X;,7 = 1, ..., 400, so that 
X = 7%" X; ~ B(400, p). Then the UMP test is given by: 


1 ify xiC 


ot, ar X400) = 317 if yon Li = 
0 af EO r > 


where C and y are determined by: 


Pi25(X < C) + y Pos5s(X = C) = 0.05, X ~ B(400, 0.25). 


(ii) Here C ~ 85.7543, and Ho is rejected when ea xi < 85. 


3.6 (i) The UMPtestis given by relation (19) with C and y defined by relation 
(20). 
(ii) Here C = 5 and y = 0.519. 
Gii) (0.375) = 0.22 and 1(0.500) ~ 0.505. 
(iv) For 6 > 0.5, 1(0) = P¡ ¿(X<n-C-—1)+ yP¡ 6X =n- C). 
(v) 2(0.625) = 0.787, and x (0.875) ~ 0.998. 
(vi) n= 62. 
38 (1) MW: = 10, Ha: 4 < 10. 
(ii) The UMP test is given by: 


1 ifa<C 
p(a)=3y fx=C, Po(XsC—-)t+yPo(X=C) 
0 ifx>C = 0.01, X ~ P(0). 


From the Poisson tables, C = 3 and y = 0.96, and since x = 4, Hp is 
not rejected. 


3.10 (i) Hp is rejected when > ;_, xi < C, C = no — 240 yM. 
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Gi) (0) = OCP). 
(iii) C = 43,252.5, Hp is rejected, and (1,700) = 0.841345. 
3.12 Hp is rejected when C < yy Zi < C2, where Cı and Ca are determined 
by ©(0.4+4+ x) — ©(0.4— x) = 0.05, x = ON From the Normal tables, x = 
0.07, so that C=0.7, and Ab is rejected when —0.7 < 21 + 22+ 23 +24 < 0.7. 


3.14 Here Hy) : o > 0.04, H; : o < 0.04, and the UMP test rejects Hy) when 
Dj (0 Y < C, C = of Xp1_q- For the numerical data, C = 0.0127392 
~ 0.013, and since ES (4 — w)? = 0.04, the hypothesis Ah is not rejected. 


L. Section 11.4 
4.2 (i) à = (0.25)'(0.75)?-“/(4)'1 — $}, t = 0, 1, 2, 3, and Mh is rejected 


when à < C where C is determined by Po.25(à < C) = 0.02. 

Gi) At level œ = 0.02, Ho is outright rejected when à = 0.015625 (which 
is equivalent to t = 3) and is rejected with probability 0.02 — 0.0156 = 
0.0044 when A = 0.31640625 (which is equivalent to t = 2). 





4.4 (ii) With t(z) = NENAS Leo- a, = (2, ...,2n), Ho is re- 
jected when t(z) < —th-1,4 OF t(z) > tn—1;4- 
(iii) Here ts9:0.025 = 1.9870, and therefore Hp is rejected when 


[3V102|/,/ 4 Yi — D? > 1.9870. 


4.6 Here Hy: u = 2.5 and Hy: p # 2.5, and H, is rejected when ZE < 
-zaj or AEH) > 2/9. Since 2025 = 1.96 and ZE = _0.8, His 
not rejected. 

4.8 Here Hy : mı = Ma, Ha : wy Æ uo, and Ho is rejected when t(x, y) < 
—tnin—2:a/2 OF EX, Y) > tn4n—2;0/2, Where 
ix, y) = VZG - Dats IL - B+ Vays — 9) x= 
(M1, ---, Zm), Y= (U, ---, Yn). Since tyg.0.025 = 2.0106 and t(x, y) = —2.712, 
the hypothesis Hp is rejected. 








4.10 Hp is rejected when w(x, y) < F, —1,n-1;1-% OF UX, Y) > Fn-1,n- ;2, Where 
UX, Y= 5 ia D/A Vays 0 x= (Hy ---,%m), y = 
(Y, ---, Yn). Here F3 3,0.975 = 0.065, F3 3.0.025 = 15.439, and u(x, y) = 2.168. 
Therefore Ap is not rejected. 


4.12 The LR test is the same as that given in Exercise 4.8. For the given nu- 
merical data, tg.0.025 = 3.9060, t(x, y) = 2.014, and therefore Hp is not 
rejected. 


4.14 They follow by integrating by parts. 


4.16 With) = e"/2u"/2e-™/?, u> 0, wehaveA'(u)= den = B@M/2yz—le—mu/2 x 
(1 — y), so that X (w) > 0 if (0 <) u < 1 and X’ (u) < Oifu > 1. It follows 
that A(w) is strictly increasing for u < 1 and strictly decreasing for u > 1. 
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Since 1'(u) = 0 gives u = 1 and Alar = —  < 0, it follows that A (u) 
is maximized for u = 1. That 1(0) = 0 is immediate, and that 1(u) > 0 as 
u— œ follows by taking the limit of the ratio of derivatives of sufficiently 
high order. 


4.18 Maximization of (58) is equivalent to maximization of g = g(u, Tt) = 
— "5" log tee @i- py ¿A ¡(4—WY, where t = 0”. From ie = 
0, 2 = k , we find fi, = “men = Di MN + 10480). 











Nest dl me, oe = EL, and 2 2 evaluated of y = fi, and t = f, yield, 
respectively; -%2", 0, and — "5". Setting C for the 2 x 2 matrix of the 


second-order denvatives of g, oe have, for 41, 42 with A? + å% A 0: 


2 
Cu, mel) = men (a 4 2) <0, 
2 T 





2t 
so that C is negative definite, and hence (2, and ĉ are the MLE of u and 
T, respectively. 


4.20 From the assumptions made, it follows that: 


X-Y¥ m (Xi —X\" (YY NY 
———— ~ N(0, D); 2 - ) aoe ee (=) Na a 


o + i=l j=l 





ale 
alr 


independent, so that their sum is ~ x? , ,_ 2- This sum is also a ana 
of X—Y. It follows that 2 divided by [Z (4547+ E 


(m+n — 2) is distributed à V a The cancellation of o leads to the 
assertion made. 


4.22 Set c = (m+ MF /m?n? and d = , so that à = A(u) = c(du)™?/ 
(1 + duy"*™/?, That 4(0) = 0 is immediate. Next, à > 0 as u => ois 
1 
also plean Furthermore, die) = e Xx OL x (m ndu) = = 0 yields 
u= ™= = me D, call it uy. Also, “ag > 0 for u < w, and 4% < 0 for 
u > a; so that 1(u) increases for u < Uy and decreases for u > w. It 


follows that (uw) attains its maximum for u = Up. This maximum is 1. 
4,24 (i) Since aro) = == r)2-! < 0, f(r) is decreasing in r. 


(iv) Since aut = = qian > 0, w(r) is increasing in r. 


L] Section 12.1 











1.2 Gi), (iii) —2 log à = E = Xi log x; —335,490.304), where x; is the number 
of births falling into the ith month. Finally, —2 log à = 78.776 and x i1001 = 
24.725. The hypothesis A, is rejected. 
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1.4 (iii) Here —2logA ~ 27.952, and Xz0.05 = 9.488. The hypothesis Hp is 
rejected. Also, X¿.0.01 = 13.277 and the hypothesis Ah is still rejected. 


L| Section 12.2 
2.2 Here x2 ~ 72.455 > 24.725 = Xio. and Ah is rejected. 


2.4 Here x3 ~ 28.161 > 9.488 = Xio05 and Hp is rejected. Also, x; = 
28.161 > 13.277 = XxŽo.o and Ah is still rejected. 


2.6 (i) Here x2 = 4 > 2.706 = xj.0.1 and Ah is rejected. 
(ii) The P-value is approximately 0.047. 
2.8 Here x2 = 1.2 < 5.991 = x20.05, and Ah is not rejected. 


2.10 Gi) pio = 0.251429, poo = 0.248571, p39 = 0.248571, 
pao = 0.15967, pso = 0.069009, peo = 0.02275. 
(iii) x2 ~ 51.161 > 11.071 = x50.05, and Ab is rejected. 


2.12 Here x3 ~ 1.668 < 11.071 = x05, and Ab is not rejected. 


Section 13.3 


3.2 1,0% = 91,22, YY = 15,228, >a? = 273.8244, Y a = 45,243.54. 
Also, SS; = 5.402, SS, ~ 101,339.419, and SS, ~ 433.922. 


3.4 Var($,) = 0.5770?, Var(ĝ2) ~ 0.18502, and 6? ~ 2,144.701. 


3.6 (1) It follows by differentiating and equating the derivatives to 0. 
(ii) —(x) As indicated. 


L| Section 13.4 


4.2 From relations (24) and (25), we find t ~ —0.737 and t ~ 0.987. Since 
t29:0.025 = 2.0452, none of the hypotheses is rejected. 


4.4 (ii) Replacing the x;’s by the t;’s and taking into consideration that t = 0, 
we get from (5) and (8) the values specified for f and $. 
(iii) Immediate by the fact that t = 0. 
(iv) B i tn 258 Fa and y + tn— 2% Jem) where S = ySSg/(n— 2), SSg = 
ss: 2 
SSy — yg", SSy = NY -— (2 x Y,) Sy = Y Yi 88 = SS. 


_ B=fo + _ -n 
O) t= Sm! = 57885: 


2 2 
(Wi) Got to235/5 + sy, Dot togS/1+ i + sy 
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L. Section 13.5 


5.2 (i) Yo = 517.474. 
Gi) S ~ 59.834, /1 + 2 ~ 0.205, and ts2;0.025 = 2.0369. The observed 
confidence interval i ls s 1492. 489, 542.459]. 
(iii) Yo = 448.937. 
(iv) The observed confidence interval is [344.442, 553.432]. 


5.4 (1) % = 15.374. 
(ii) The observed confidence interval is [14.329, 16.419]. 
Gii) Yo = 15.374. 
(iv) The observed confidence interval is [14.278, 16.470]. 
5.6 (i) By ~ —9.768, Bo ~ 2.941, 6? ~ 0. es 
Gi) t11:0.025 = 2.201, S = 0.071, Ji pz + ~ 10.575, so that the observed 
confindence intervals for 6; and #2 are: [—11.421, —8.115] and [2.490, 
3.392], respectively. Since X71.9 925 = 21.92 and xi10.975 = 3.816, the 
observed confidence interval for o? is [0.003, 0.015]. 


(iii) Both EY, and Y, are predicted by 1.32. The respective observed con- 
fidence intervals are [1.254, 1.386] and [1.15, 1.49]. 


| Chapter 14 


L. Section 14.1 


1.2 In (1), set wy = --- = ugr = u to obtain: 





IJ IJ ‘ 1 
log L(y; u, o°) === log@r) — = logo” — 55) Wy- uY. 
i j 


Set S(u) = X`; X (yy — 10", and observe that SM) = -2 ¿1 ¡(Yi — 
u) = 0 gives y = FD Yy = Y, and ¿¿S(u) = 213 > 0 for all 
values of u, so that à = y. minimizes S(u). Replacing u by 4 in the above 
expressions, and setting 


A 


a s IJ IJ 1 
=S), log Ly; fi, 07) = -= log(2r) logo’ S, 
2 2 20? 





we obtain of, = © from 5 log LY; fi, 0?) = 0. Also, 
e IJ 
lo L rm») di =="... Ss 0, 
do gL(y; ñ, ol do (04 


so that oF, = = = $ Di Du — y) = FF is the MLE of o? under Ap. 
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1.4 


m Section 14.2 


2.2 


L. Section 14.3 


3.2 


3.4 


Recall that for i = 1, ..., I and j =1,..., J, the r.v.'s Yi; are independent 

with EY; = m; and Var (Yy) = o°. 

G) From Y, = 75 DD y Yy, we have then EY. = 7) 0; } j EYy = } x 
jx m> = u, and therefore E(Y, — u}? = Var(Y.) = Var(4 Y; 
Ej Y) = gpl? =F. 

Gi) From Y; = ty, Yy, we have EY, = 5D) M4 = 3 ga 
that E(Y,, — AL Var(¥;,) = Var (4 Yi) = ~ y, o? 

Gi) EY; — u = EY: — m) + (1 — u)? = EM, — m)? + m =k); 
because E[(Y; —u:)Xui— n. )] = (mizu JEY — ui) = (mi~ u.)x0 = 0. 


Hi, SO 


ae l 


We have: fi; = 11.25, 42 = 17.00, fig = 15.50, fl ha = 14.75, so that y = 
11.25c, + 17.00c3 + 15.50c3 + 14.75c4. Also, Var (Y) = 1. 55204 da), and 


S? = 10.4709, so that S,/ Var) = 4.031,/ pen È . Therefore the required 
observed confidence interval is: 





4 
11.25c1 + 17.00c + 15.50c3 + 14.75c4 + 4.031 > a 


i=1 








By (31) and (32), La@;u, B, 0?) = a exp[—S( 1, 8)], where 
Su, B) = > Uyu- B7}. For each fixed o°, minimize S(u, 3) with 
respect to y and the £,’s subject to the kesitin par Bj=0. Doing this 
minimization by using Langrange multipliers, we find the required MLE’s; 
namely, fia = Y, Bj a=Y-j7-Y,I=1,...,d. 

(i) That y = X’G is immediate, and from this it follows that y lies in the 
vector space generated by the columns (rows) of X’. 

Gi) Here 7+ J+1< JJ, or J > TŁ, provided I > 2. Thus, for J > 2 and 
J > (I+ 1)/Q — 1), it follows that minfJ + J+ 1, IJ} = I+J+1, 
and hence rank X’ < I+J+1. 

Gii) Parts (a) and (b) are immediate. It then follows that rank X’ < I + 
J— 1. To see part (c), multiply the columns specified by the respective 
scalars 41, 42, ..., QI—1, b1, ..., by and add them up to obtain 


(bı, b2, ..., bJ, 41 + bi, Qi + b2, ..., Q1 +by,..., 7-1 +01, Ar-1 
+ bo,...,a7-1+ 07), 





and this vector is zero if and only if b} = ---=b; = 0 = q = -= 
ar—ı. The conclusion of independence follows. 
So, n, although it has IJ coordinates, belongs in an (J + J — 1)-dimensional 
space (J + J — 1 < IJ), and therefore the dimension of 7 is J + J — 1. 
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2-dimensional normal distribution, 143, 147 
2-dimensional random vector, 110, 138 


A 
a priori probabilities, 45, 271 
Acceptance region, 233, 361 
Analysis of variance, 235, 397 
basics, 236-238 
column effects, 418 
contrasts, 237, 407, 408 
examples, 403-404, 409-413, 419-420 
general linear models, 421 
multicomparison method, 407-412 
one-way (classification) layout model. 
See One-way (classification) layout 
model 
row effects, 418 
tables for, 402, 418 
two models of, 397-427 
two-way (classification) layout model. 
See Two-way (classification) layout 
model 
uses, 236-237 
ANOVA. See Analysis of variance 
Asymptotically 
normal, 444, 445, 446 
unbiased, 443, 445 
Axiomatic definition of probability, 25-26 


B 
Bayes approach, 272 


decision function, 359 
estimates, 271, 274, 278 
formula, 46-48 
Best linear predictor, 137 
Beta 
p.d.f., 273 
expectation, 275 
Binomial distribution, 79-81 
application, 315-316 
cumulative table, 450-457 


expectation, 81, 477 

graphs, 80 

m.g.f., 81, 479 

point estimation and, 228 

Poisson distribution relationship to, 84-85, 

98 

p.d.f., 79, 477 

variance, 81, 477 
Binomial experiment, 79, 140 
Bivariate normal distribution, 143-146 

correlation coefficient, 148 

example, 145-146 

expectations, 148, 478 

graph, 143 

m.g.f., 146, 479 

p.d.f., 143, 478 

variances, 148, 478 


C 
Cauchy distribution, 76, 325 
Cauchy-Schwarz inequality, 131 
Center of gravity, 69 
Central limit theorem (CLT), 90, 208, 210-213, 
225 
applications, 213-215 
binomial and, 214 
confidence intervals and, 295 
continuity correction and, 215-217 
examples, 212, 214-215 
normal approximation and, 211 
Chi-square distribution 
critical values for, 465-466 
expectation, 89, 477 
graph, 90 
m.g.f., 89, 479 
p.d.f., 89, 477 
variance, 89, 477 
Chi-square goodness-of-fit test. See Goodness- 
of-fit test 
Combinations, 61 
Completeness, 264 
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Index 


Concept of probability and basic results. 
See Probability and basic results, 
concept of 

Conditional and marginal probability density 
functions, conditional expectation and 
variance, 117-126 

examples, 118-123 

exercises, 123-126 

k random variables and, 138 
multinomial distribution and, 140 

Conditional expectations, 138 

Conditional probability and related results, 
41-51 

Bayes formula, 46-48 
definition, 42 
examples, 42, 43-44, 45 
exercises, 48-51 
multiplicative theorem, 44 
total probability theorem, 45-46 
Conditional 
expectation, 120 
variance, 123, 138 
Confidence intervals, 228, 282-292 
with approximate confidence coefficient, 
282, 294-298 
construction steps, 283, 407-408 
definition, 282 
examples, 283-285, 289-292, 295-296 
exercises, 285-289, 292, 296-298 
for quantiles of distribution function, 
431-433 
in linear regression model, 374-375, 383-389 
nuisance parameters, presence of, 289-292 
random interval and, 230-231 
significance of, 282 
with given approximate confidence 
coefficient, 294-296, 429-431 
Confidence regions, 292-294, 360 
confidence interval and, 231 
examples, 292-294, 361-362 
testing hypotheses relationship to, 
360-362 
theorem, 361 

Contingency tables, likelihood ratio tests in 
multinomial case and, 343, 345-348 

Continuity correction, 215-217 

Continuous case distribution, 86-95 

chi-square, 89, 90 

gamma, 86-87 

median and, 102, 103 

negative exponential, 88-89 
normal, 89, 90-93 

uniform (or rectangular), 94-95 

Continuous sample spaces, 13 

Convergence modes of random variables, 
applications, 202-226 


central limit theorem, 210-215 
continuity correction, 215-217 
in distribution or in probability, 202-208 
further limit theorems, 222-226 
weak law of large numbers, 208-210 
Correlation coefficient, 132 
Counting, basic concepts and results in, 
59-67 
exercises, 64-67 
fundamental principle of counting, 59-60, 
61-62 
problem of counting examples, 59, 62-64 
Covariance, 126, 129-135 
Cramér-Rao (C-R) inequality, 262-263, 265, 266 
examples, 264 
usage, 263 
Cramér-Wold devise, 139 
Critical or rejection region, 233 
Critical values for chi-square distribution table, 
465-466 
Critical values for F-distribution table, 467-476 
Critical values for student's t-distribution table, 
463-464 
Cumulative binomial distribution table, 
450-457 
Cumulative Poisson distribution table, 458-459 
Curve estimation, nonparametric, 442-449 


D 
Decision-theoretic approach to estimation, 229, 
270-277 
Bayes estimate, 271, 274 
decision function, 272 
examples, 273, 274-275 
exercises, 275 
loss function, 270 
minimax, 270, 274, 275 
risk function, 270 
theorems, 271, 274 
Decision-theoretic approach to testing 
hypotheses, 234, 353-360 
Bayes decision function, 359 
examples, 356-359 
loss function, 354 
minimax, 354-355 
nonrandomized decision function and 
353-354 
risk function, 354 
theorem, 355 
Delta method, 225 
DeMorgan's laws, 15, 16 
Dependent events, 51 
rv.’s, 151 
Discrete case distributions, 79-86 
binomial, 79-81 


, 


Index 


geometric, 81-82, 83 
hypergeometric, 85-86 
median and, 102-103 
Poisson, 83-85 
Discrete sample spaces, 13 
Disjoint, 14 
Dispersion, 71 
Distribution(s) 
2-dimensional normal, 143, 147 
bivariate normal, 143-146 
characteristics table, 477-479 
confidence intervals for quantiles of, 
431-433 
convergence, 202-222 
function (d.f.), 34-35 
graphs, 34-35, 203 
joint, 110-117 
k-variate normal, 146, 147 
marginal and conditional, 117-126 
multinomial, 140-142, 143 
multivariate normal, 146-147 
of random variable X properties, 34, 37 
probability, 34, 69 
reproductive property of certain 
distributions, 159-167 
Distribution of random variables, 33—41. 
See also Distribution and Probability 
density function 
beta, 273 
binomial, 79-81 
Cauchy, 76, 325 
chi-square, 89 
continuous, 86-95 
discrete, 79-86 
double exponential, 247 
exercises, 39-41 
F,179 
function graphs, 34-35 
gamma, 86-87 
geometric, 81-82, 83 
hypergeometric, 85-86 
mode, 104-106 
negative exponential, 88-89 
normal, 89, 90-93 
Poisson, 83-85 
t, 177 
uniform (or rectangular), 94-95 
Weibull, 99 


E 
Effects 
column, 412, 418 
row, 412, 418 
Error(s) sum of squares, 365, 402, 418 
type L 233 
type II, 233 
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Event(s), 8 
certain, 8 
complement of, 13 
difference of, 14 
disjoint, 14, 26 
happens, 8 
impossible, 8 
intersection of, 13, 14 
monotone, 16-17 
occurs, 8 
related results and independent events, 
51-59 
union of, 13 
Expectation of random variables, 68-77 
definition, 69 
examples, 70-74 
exercises, 74-77 
of selected discrete and continuous 
distributions, 477, 478 
Exponential type of families of probability 
density functions, 307-308 


F 
F critical values for, 467-476 
expectation, 181 
graph, 180 
p.d.f., 179 
variance, 181 
Factorization theorem, 151-152, 154 
Failure rate, 99 
Fisher information, 259 
Fisher-Neyman factorization theorem, 256 
Fitted regression line, 366, 370, 372, 377, 
378 
Fundamental concepts, 8-19 
events, 8 
exercises, 17-19 
intersection of events, 13, 14 
mutually or pairwise disjoint, 14 
random experiment, 8 
sample points, 8, 13 
sample space, 8, 13 
union of events, 13 
Venn diagram, 8, 9, 13-15 
Fundamental principle of counting theorem, 
60, 61 
corollary to, 61 
proof of, 61-62 
proof of corollary, 62 
Further limit theorems, 222-226 


G 

Gamma distribution 
expectation, 87, 477 
graph, 86-87 
m.g.f., 87, 479 
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Index 


p.d.f., 86, 477 
variance, 87, 477 
Gamma function, 86 
recursive property, 87 
Geometric distribution 
expectation, 82, 83, 477 
graphs, 82 
m.g.f., 82, 83, 479 
p.d.f., 82, 83, 477 
variance, 82, 83, 477 
Goodness-of-fit test, 234, 349-353 
exercises, 351-353 
multinomial distribution and, 349 
numerical examples, 350-351 
Graduate Management Aptitude Test (GMAT), 
7, 364, 384 


H 


Hypergeometric distribution 
expectation, 85, 477 
p.d.f., 85, 477 
variance, 85, 477 


I 
Independence of random variables and some 
applications, 150-167 
criteria, 150-159 
definition, 151 
examples, 152-153, 162-163 
exercises, 156-159, 164-167 
factorization theorem, 151-152, 154 
reproductive property, 159-167 
Independent events and related results, 
51-55 
definitions, 51, 52-53, 55 
examples, 52, 54-55 
exercises, 55-59 
theorem, 53 
Interval estimation, 228 
basics, 230-231 
confidence interval, 231 
lower confidence limit, 231 
random, 230-231, 282 
statistic(s), 230 
upper confidence limit, 231, 282 
Inversion formula, 73, 129 


J 
Joint moment generating function, 126-128, 
146 
Joint probability distribution functions, 
110-117, 137 
examples, 111-115 
exercises, 115-117 
k random variables and, 137-138 


K 


k random variables 

generalizations, 137-139 

sample mean and sample variance of, 159 
k-variate normal distribution, 146, 147, 478 
kernel-estimation approach, 239, 442, 443 
Kolmogorov, 25 


L 
Lagrange multipliers, 409 
Least squares (LS’s), 230, 277 
estimate (LSE), 230, 363, 366-374 
examples, 369-374 
fitted regression line, 366, 370, 372 
minimize sum of squares error, 365-366 
pairs relationship, 364-365 
principle of, 363, 365-366 
regression line, 366 
theorems, 366, 371, 373 
Lehmann-Scheffé theorem, 265 
Level of significance of test employed, 
233, 234 
Likelihood equation(s), 242, 245 
Likelihood function, 229, 234, 241, 247, 254,375, 
398, 413 
maximum, 325 
Likelihood ratio (LR) tests, 234, 299, 324-342 
applications, 327-337, 345-347 
examples, 325-327 
exercises, 337-342, 348-349 
in multinomial case and contingency tables, 
343-349 
linear regression model and, 384 
motivation, 235, 324 
normal case applications, 327-337 
numerical examples, 328, 330, 331, 335, 344, 
348 
one-way layout and, 398 
theorem, 343-344 
two-way layout and, 415 
Linear regression model, simple, 236, 363-396 
concluding remarks, 395-396 
confidence intervals, 374 
confidence intervals and hypotheses testing 
problems, 383-389 
errors, normally distributed, 374-383, 
393-395 
examples, 369-370, 376-379, 384-386 
fitted regression line, 366, 370, 372, 377 
general, 396, 421 
least squares estimates of £; and £2, 
366-374 
least squares principle, 365-366 
likelihood ratio tests and, 384 
multiple, 396 


Index 


pairs relationship, 364-365 
prediction problems, 389-393 
regression line, 366 
setting up, 364-366 
theorems, 366, 371, 373, 375, 379, 383, 384, 
390, 391 

Linear transformations, 185-191 
definition, 186 
exercises, 190-191 
orthogonal, 186 
theorems, 187, 189 

Lower confidence limit, 231, 282 


M 
Marginal and conditional probability density 


functions, conditional expectation and 
variance, 117-126 
examples, 118-123 
exercises, 123-126, 139, 149 
k random variables and, 138 
multinomial distribution and, 140 
Marginal moment generating function, 128 
Markov inequality, 77 
Mathematical expectation. See Expectation of 
random variables 
Matrix 
orthogonal, 186 
transpose, 186 
Maximum likelihood estimates (MLE’s), 229, 
278 
definition, 241, 242 
Fisher-Neyman factorization theorem, 256 
identification of, 242 
invariance property, 254 
linear regression model and, 374-383 
motivation and examples, 240-253 
one-way layout and parameters, 399 
properties, 253-261 
sufficient statistics, 257-258 
theorems, 253, 254, 256, 258-259 
two-way layout and parameters, 413-414 
Maximum likelihood function, 324-326 
Mean, mean value, See expectation 
Measure of 
dispersion, 71 
location, 69, 71 
Median and mode of random variables, 
102-108 
continuous case, 102, 103 
definitions, 103, 104 
discrete case, 102-103 
examples, 102-106 
exercises, 106-108 
Method of moments, 229, 277 
exercises, 278-279 
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Minimax 
decision function, 354-357 
estimate, 270, 274, 275, 278 
Minimum chi-square method, 277 
Mode of distribution, 104-106 
Moment estimates, 277 
Moment generating function (m.g.f.), 72-74, 93 
inversion formula and, 73, 129 
joint, 126-129, 138, 142, 146 
marginal, 128, 130 
of selected discrete and continuous 
distributions, 479 
Moments, 71 
Monotone events, 16-17 
Most powerful (MP) test, 234, 303 
Motivating examples, 1-8 
Multicomparison method of analysis of 
variance, 407-412 
Multinomial distribution 
correlation coefficients, 143 
examples, 140-143 
expectations, 142, 478 
goodness-of-fit test and, 349 
likelihood ratio tests and, 343-349 
marginal and conditional probability density 
functions and, 140 
m.g.f., 142, 479 
p.d.f., 140, 478 
theorem, 141 
variances, 142-143, 478 
Multiple random variables, transforming, 
173-185 
Multiplicative theorem, 44 
Multivariate normal distribution, 146-147 


N 


Negative exponential distribution, 88 
expectation, 88, 477 
graph, 88 
m.g.f., 88, 479 
p.d.f., 88, 477 
variance, 88, 477 
Negatively correlated, 133 
Neyman-Pearson fundamental lemma, 299, 
302-307 
application examples, 305-307 
most powerful test and, 303, 305-306 
proof of theorem, 303-304 
uniformly most powerful test and, 306 
Nonparametric curve estimation, 442-449 
Nonparametric inference, topics in, 
428-449 
basics, 238-239 
confidence intervals with given confidence 
coefficient, 429-431 
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confidence intervals for quantiles of 
distribution function, 431-433 
curve estimation, 442-449 
definition, 227 
kernel method, 239, 442 
probability density function estimation, 
442-444 
rank sum tests, 435-438, 440-442 
regression estimation, 444-447 
two-sample sign test, 433-435 
weak law of large numbers and, 239 
Wilcoxon-Mann-Whitney test, 438 
Nonrandomized decision function, 353 
Nonrandomized test, 233, 303 
Normal distribution, 89, 90-93, 95 
central limit theorem, 90 
expectation, 92, 477 
graphs, 90, 91 
importance of, 89, 90 
likelihood ratio tests and, 327-337 
m.g.f., 93, 479 
p.d.f., 90, 477 
standard, 91 
table, 460-462 
testing hypotheses about the mean, 317-319, 
320-321 
testing hypotheses about the variance, 
319-320 
variance, 92, 477 
Nuisance parameters, confidence intervals in 
presence of, 289-292 
Numerical characteristics of random 
variables, 68-108 
expectation, variance, and moment 
generating function, 68-77 


O 
One-way (classification) layout model analysis 
of variance, 237, 397, 398-407 
Order statistics, 193-201 
definition, 193 
examples, 196, 197-199 
exercises, 199-201 
theorems, 194, 196 


P 


Parameter space, 227, 228 
Parametric statistical inference, 
227 
Partition, 45 
Permutations, 61 
Point estimation, 227, 240-280 
basics, 228-230 
binomial distribution and, 228 
decision-theoretic method, 229, 270-277 


least squares, 230, 277, 364-374 

maximum likelihood estimate, 229, 
240-246 

maximum likelihood estimation motivation 
and examples, 240-253 

maximum likelihood estimation properties, 
253-261 

method of moments, 229, 277 

other methods, 277-280 

parameter(s), 228, 240 

Poisson distribution and, 228 

unbiasedness, 229 

uniformly minimum variance unbiased, 229, 
261-270 


Point of equilibrium, 69 
Poisson distribution, 83-85, 95, 215 


application, 316-317 

binomial distribution relationship to, 84-85 
cumulative table for, 458-459 

expectation, 84, 477 

graph, 83 

m.g.f., 84, 479 

point estimation and, 228 

p.d.f., 83, 477 

uses for, 84 

variance, 84, 477 


Principle of least squares. See Least squares 


(LS's) 


Probability 


axiomatic definition, 25-26 

classical definition, 24-25 
conditional, 41 

inequalities, 77-79 

justification of basic properties, 28-29 
relative frequency definition, 25 


Probability and basic results, concept of, 


23-67 

conditional probability and related results, 
41-51 

counting, basic concepts and results in, 
59-67 

definition, 24-26 

examples in calculating probabilities, 26-31 

independent events and related results, 
51-59 

random variable distribution, 33-41 

theorems, 31, 44, 45, 46-47, 53, 60 


Probability density function (p.d.f.), 37 


definition, 35, 36 

graph examples, 80, 82 ,83, 86-88, 90, 91, 94, 
178, 180 

nonparametric estimation of, 442-444 

of selected discrete and continuous 
distributions, 477-479 

probability inequalities, 77 


Index 


Probability integral transform, 192-193 
Prior probabilities, 45 


Q 


Quantile(s), 103 


R 


Random experiment, 8 
examples, 2-8 
Random interval, 230-231, 282 
Random sample, 155 
Random variables (r.v.'s) 
continuous, 36, 86-95 
convergence modes of, 202-226 
definition, 20 
degrees of freedom (d.f.), 34 
denoting a, 20 
discrete, 36, 79-86 
distribution, 33-41 
exercises, 21-22 
expectation, variance, and moment 
generating function, 68-77 
independence of, 150-167 
introduction of, 19-21 
k, generalization to, 137-139 
median and mode of, 102-108 
numerical characteristics of, 68-108 
special, 79-101 
transformation of, 168-201 
types of, 21 
Randomized tests, 233, 303 
Rank sum test, 435-438, 440-441 
Rank test, 239, 429, 435-438, 440-442 
Rao-Blackwell theorem, 265 
Recursive relation for 
binomial p.d.f., 95, 96 
gamma function, 87 
hypergeometric p.d.f., 98 
Poisson p.d.f., 97 
Regression analysis 
basics, 235-236 
linear regression model, 236, 
363-396 
simplest form, 235-236 
Regression estimation, nonparametric, 
444-447 
Regression line, 366 
Regression model 
fixed design, 447 
linear. See Linear regression model, simple 
stochastic design, 447 
Relative frequency definition of probability, 
25 
Reproductive property of certain distribution, 
159-167 


521 


examples, 162-163 

exercises, 164-167 

theorems, 160-161, 164 
Risk function, 270, 354 


S 


Sample mean, 159 
Sample points, 8, 13 
Sample range, 200 
Sample space, 8 
continuous, 13 
discrete, 13 
examples with countably infinite points, 
10-11 
examples with finitely many sample points, 
9-10 
examples with nondegenerate finite or 
infinite intervals in real line, 11 
random experiment and, 19, 20 
Sample variance, 159 
Sign test, 239, 433-435 
Single random variables, transforming, 168-173 
Standard deviation (s.d.), 71-72, 77 
Statistic, 229 
sufficient, 256 
Statistical analysis. See Analysis of variance 
Statistical hypothesis, 232 
alternative, 232 
null, 232 
Statistical inference overview, 227-239 
aim, 227 
analysis of variance basics, 236-238 
interval estimation basics, 228, 230-231 
nonparametric inference basics, 227, 
238-239 
parametric, 227 
point estimation basics, 227, 228-230 
regression analysis basics, 235-236 
testing hypotheses basics, 228, 231-235 
Stirling formula, 185 


T 

t distribution, 177, 178 
expectation, 172 
p.d.f., 177 
variance, 172 

Tchebichev inequality, 77, 208 

Test function, 232 

Testing hypotheses, 228, 299-342, 343-362 
acceptance region, 233, 361 
basics, 231-235 
binomial case application, 315-316 
concepts, general, 300-302 
confidence regions relationship to, 360-362 
critical or rejection region, 233 


Index 


decision-theoretic approach, 234, 353-360 

equality of means, 399-404 

exponential type families of probability 
density functions, 307-308 

for parameters in single normal population, 
327-332 

for parameters in two normal populations, 
332-337 

formulating, 300-302 

goodness-of-fit test, 234, 349-353 

level of significance, 233, 234 

likelihood ratio tests, 234, 235, 299, 324-342 

linear regression model and problems in, 
383-389, 389-392 

most powerful test, 234, 303, 305, 306 

Neyman-Pearson fundamental lemma, 299, 
302-307 

nonrandomized test, 233, 303 

normal case application, 317-321 

p-value (probability value), 235 

Poisson case application, 316-317 

power, 234, 304, 309, 311, 312, 318, 319, 321, 
328, 340 

randomized tests, 233, 303 

statistical hypotheses, 232 

two-way layout, 414-420 

type I error, 233 

type II error, 233-234 

uniformly most powerful tests, 234, 299, 301, 
306 

uniformly most powerful tests for composite 
hypotheses, 308-312, 315-321 


Theorems, 31-32, 104, 105, 131, 134, 135 


Bayes formula, 46-48, 271, 359 
Cauchy-Schwarz inequality, 131 
central limit, 90, 208, 210-215 
confidence interval for contrasts, 408 
LSE’s (MLE’s), 383 
predictor, 390, 391 
quantiles, 432 
confidence regions and testing hypotheses, 
361 
continuity, 206 
convergence in distribution, 203, 206 
convergence in probability, 204, 206 
convergence of MLE, 258, 259 
correlation coefficient, 134 
Cramér-Rao inequality, 263 
decision-theoretic approach to estimation, 
271, 274 
decision-theoretic approach to testing 
hypotheses, 355 
decomposition of total variability, 373-374 
distribution of LSE’s (MLE’s), 375 
distribution of sums of squares, 379, 393-395 


factorization, 151-152, 154 

Fisher-Neymann factorization, 256 

fundamental principle of counting, 60, 
61-62 

further limit, 222-226 

independence of sample mean and sample 
variance in a normal distribution, 
163, 189 

independent events, 53 

invariance property of MLE, 253, 254 

LSE’s, 366 

likelihood ratio tests, 343-344 

linear regression model, 366, 370, 373, 375, 
379, 383, 384, 390, 391 

linear transformations, 187, 189 

maximum likelihood estimates, 253, 254, 
258-259 

minimax decision function, 355-356 

minimax estimate, 274 

mode, 104, 105 

multicomparison method in analysis of 
variance, 407 

multinomial distribution, 141 

multiplicative, 44 

Neyman-Pearson fundamental lemma, 
302-303 

nonparametric curve estimation, 443-444 

nonparametric inference, 432 

nonparametric regression estimation, 445, 
446 

one-way layout model, 399, 401 

order statistics, 194, 196 

probability inequalities, 77, 78 

probability integral transform, 192 

rank sum test, 437 

Rao-Blackwell and Lehmann-Scheffé, 265 

reproductive property of distributions, 
160-161, 164 

sign test, 434 

Slutsky, 223 

testing hypotheses in linear regression 
model, 384-385 

total probability, 45-46 

transforming multiple random variables, 174, 
182-183 

transforming single random variable, 169, 
170, 171 

two-sample sign test, 434 

two-way layout, 414, 418 

uniformly most powerful test for composite 
hypotheses, 308-309, 311 

variance of LSE’s, 371 

variance of sums of r.v.'s, 135, 139 

WLLN, 208 

Wilcoxon-Mann-Whitney test, 438 


Index 


Total probability theorem, 45-46 
Transformation of random variables, 
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completeness, 264 
Cramér-Rao inequality, 262-264 


168-201 

examples, 168, 170-171, 173, 174-181, 
192-193, 196-199 

exercises, 171-173, 183-185, 190-191, 193, 
199-201 

linear, 185-191 

order statistics, 193-201 

orthogonal, 186 

probability integral transform, 192-193 

single, 168-173 

two or more, 173-185 

Triangular probability density function, 

176 

Two-sample sign test, 433-435 

Two-way (classification) layout model, 238, 

412-420 

examples, 412-413, 419, 420 

exercises, 425-427 

lemmas and proof, 413-414, 415-418, 
420-425 

maximum likelihood estimates parameters, 
413-414 

table, 418 

testing hypotheses, 414-420 

theorems, 414, 418 

with one observation per cell, 412-427 


definition, 261, 262 
desirability of, 262 
examples, 261-262, 264, 265-266 
Rao-Blackwell and Lehmann-Scheffé 
theorems, 265 
Uniformly most powerful (UMP) tests, 234, 299, 
301 
exercises, 313-314 
for composite hypotheses, 308-312 
Neyman-Pearson fundamental lemma and, 
306 
power for one-sided hypotheses, 310 
power for two-sided hypotheses, 312 
Upper confidence limit, 231 


V 
Variance analysis basics, 236-238 
Variance analysis models. See Analysis of 
variance 
Variance of random variables, 71-72 
of selected discrete and continuous 
distributions, 477, 478 
Venn diagram, 9, 13-15 


W 
Weak law of large numbers (WLLN), 208-210, 


U 
Unbiasedness, 229 
Uniform (or rectangular) distribution, 94-95, 
477 
expectation, 94, 477 
m.g.f., 94, 479 
p.d.f., 94, 477 
variance, 94, 477 
Uniformly minimum variance unbiased 
(UMVU) estimates, 229, 261-270, 278 


224, 225, 277 
applications, 209-213 
confidence intervals and, 295 
example, 212-213 
interpretation and most common use, 209 
nonparametric inference and, 239 
theorem, 208 


Weibull distribution, 99 
Wilcoxon-Mann-Whitney test, 429, 438 


examples, 439-440 
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Table of Selected Discrete and Continuous Distributions and Some of their Characteristics 





Distribution 


Binomial, B(n, p) 


(Bernoulli, B(1, p) 


Geometric 


Poisson, P(A) 


Hypergeometric 


Gamma 


Negative Exponential 


Chi-Square 


Normal, N( 1, 0?) 


(Standard Normal, N(0, 1) 


Uniform, U(a, 3) 


PROBABILITY DENSITY FUNCTIONS IN ONE VARIABLE 


Probability Density Function 
: n _ 
Xx 
0O<p<lLq=1-p 
St) = pq’, x=0,1 


SF) = pq™", %=1,2,...; 
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Variance 
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(Continued) 


PROBABILITY DENSITY FUNCTIONS IN MANY VARIABLES 


Distribution Probability Density Function Means Variances 
: a n 
Multinomial Sæ, ..., Lk) = — X NPI, -- +, Wk NP1Q1, «++» NPkAk- 
xa lxa!--- og! 
Di DA --- pps, Xi > 0 integers, a=1-mj=1...,k 


X + x2 + o + Ek =n; pj > 0, j= 1, 


2,...,k, pit P+ + pe=1 





Bivariate Normal fœ, ©2) = 


1 q 2 2 
exp š p1, H2 07,0 
20,091 p2 2) ee 
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1 YL Xx — Mı 
q= I 3 2p 
=p 01 01 
2 
(=) l (232) | 
x | , 
02 o2 


£, X2, ER; u1, u2 ER, 01,02 > 0, —1 < p < 1, p = correlation coefficient 








k-Variate Normal, N( p, £) =P 3 12% H1, ---> Uk Covariance matrix: > 


1 E 
exp | -307 WETE- p), 
xen pen S:kxk 


nonsingular symmetric matrix 


Distribution 


Binomial, B(n, p) 
(Bernoulli, B(1, p) 


Geometric 


Poisson, P(A) 


Hypergeometric 


Gamma 
Negative Exponential 


Chi-Square 

Normal, N(, 0?) 
(Standard Normal, N(0, 1) 
Uniform, U(a, 3) 


Multinomial 


Bivariate Normal 


k-Variate Normal, N(, ©) 


Moment Generating Function 


M(t) = (pë +q),teR 


M(t) = pt +q,te Ñ) 
t 
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